Gerard Goossen schreef: > But for example when doing a regex for a > fixed string like m/aap/ in UTF-8 you just have to do a memory search > for the bytes representing 'aap' in UTF-32 you can do the same, but > you have much more memory to search through. A string encoded such that it uses the word size of the platform, will actually search faster. It is an extra step to get to the bytes, which on most hardware takes considerable time. > Whether UTF-8 or UTF-16 is faster depends on the content of your > strings, if you mostly have ASCII data UTF-8 would be shorter and thus > faster, if you are dealing with non western languages, the UTF-16 > encoding will probably be shorter and thus faster. That UTF-32 is equivalent to UCS-4 restricted to 0..10FFFF(16), so *fixed width*, is important. http://unicode.org/reports/tr19/tr19-9.html -- Affijn, Ruud "Gewoon is een tijger."