I think the API in 5.26 for handling UTF-8 is finally good enough that it's time to work on speeding up. This free dfa converter has long been known to us, and is considered the one to beat by various people on the Unicode mailing list http://bjoern.hoehrmann.de/utf-8/decoder/dfa/