Glenn Linderman wrote: > On approximately 1/31/2006 12:06 PM, came the following characters from > the keyboard of Linda W: > >> The NT-based registry uses 16-bit binary "blobs" (wchar_t) that >> are not, *strictly*, interpretable as UTF-16, UCS2 or any standard >> character set. As such, they aren't suitable for being converted to a >> printable ASCII or Unicode string that can be manipulated with Perl's >> standard string functions. > > Could you elucidate this "*strictly*" comment, or provide a doc ref that > explains it further? I was certainly under the impression that registry > keys were UCS-2 (and I haven't looked to see how the conversions to text > are handled by the perl modules or the Win32 APIs they call). --- Ah..."Strictly"...well, MS certainly would like you to think they use UCS2, and their documentation calls it that (as well as "Unicode"), but a 16-bit, "wide-char" or "wchar" value doesn't not a Unicode character make! :-) Some 16-bit values are used as lead-ins for "surrogates" (using two 16-bit values, like the pair {0xD800,0xDC00), and some like {0xFFFE,0} are simply not legal Unicode characters. The closest thing I've found to a complete answer was documentation describing the structure of the registry: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/structure_of_the_registry.asp "Key names cannot include a backslash (\), but any other printable or unprintable character can be used. " When the NT-Unicode level first came out, I don't _believe_ there were any multi-wchar extensions in Unicode. MS was an early adopter -- they didn't put "unicode-interpretation" into their low-level API's. As an even more UGKY point, keynames/valuenames can contain an embedded NULL: something you can't do with the Win32 API, but something that can be created in your registry (the undocumented, native NT API uses a count to specify name-lengths). The resulting key cannot be modified or directly deleted via Regedit. (see http://www.sysinternals.com/Information/TipsAndTrivia.html#HiddenKeys for a sample program that creates such a key and then deletes it; or see http://www.sysinternals.com/Utilities/RegDelNull.html to download a utility to scan for such keys; I found a few hidden in my Security Hive -- would be nice to know where they came from...*ahem* (sigh). Aside from the verbose documentation on the subject, you can use the attached, =non-harmful= (AFAIK) example that _is_ deletable via Regedit, but is not valid Unicode: From an character point of view, the file looks like the following in vim: line#--| 01 Windows Registry Editor Version 5.00 02 03 [HKEY_CURRENT_USER\AppEvents\Schemes\Names\aaa<fffe>Ã~] 04 @="bogon" -------| The hex-dumped value of the "name" at the end of line 3 is 000000a0 6d 00 65 00 73 00 5c 00 61 00 61 00 61 00 fe ff |m.e.s.\.a.a.a...| 000000b0 d8 00 5d 00 0d 00 0a 00 40 00 3d 00 22 00 62 00 |..].....@.=.".b.| Note the "illegal" Unicode characters @ 0x00ae-0x00b1. I've attached the file as a .reg and .zip file in case the .reg gets mangled by a mailer. You can delete the key using regedit (displayed as "aaaØ" on my system). Did that shine sufficient light on the matter? :-) LindaThread Previous | Thread Next