develooper Front page | perl.perl5.porters | Postings from January 2006

Re: including Win32 functions in Core (was Re: [PATCH] Re: replacing"inuse" Win files...)

Thread Previous | Thread Next
Linda W
January 31, 2006 18:48
Re: including Win32 functions in Core (was Re: [PATCH] Re: replacing"inuse" Win files...)
Message ID:
Glenn Linderman wrote:
> On approximately 1/31/2006 12:06 PM, came the following characters from 
> the keyboard of Linda W:
>>     The NT-based registry uses 16-bit binary "blobs" (wchar_t) that
>> are not, *strictly*, interpretable as UTF-16, UCS2 or any standard
>> character set.  As such, they aren't suitable for being converted to a
>> printable ASCII or Unicode string that can be manipulated with Perl's
>> standard string functions.
> Could you elucidate this "*strictly*" comment, or provide a doc ref that 
> explains it further?  I was certainly under the impression that registry 
> keys were UCS-2 (and I haven't looked to see how the conversions to text 
> are handled by the perl modules or the Win32 APIs they call).
	Ah..."Strictly"...well, MS certainly would like you to think they
use UCS2, and their documentation calls it that (as well as "Unicode"), but
a 16-bit, "wide-char" or "wchar" value doesn't not a Unicode character make! :-)

	Some 16-bit values are used as lead-ins for "surrogates" (using
two 16-bit values, like the pair {0xD800,0xDC00), and some like {0xFFFE,0}
are simply not legal Unicode characters.

	The closest thing I've found to a complete answer was documentation
describing the structure of the registry:

"Key names cannot include a backslash (\), but any other printable or
unprintable character can be used. "

	When the NT-Unicode level first came out, I don't _believe_ there were
any multi-wchar extensions in Unicode.  MS was an early adopter -- they didn't
put "unicode-interpretation" into their low-level API's.

	As an even more UGKY point, keynames/valuenames can contain an embedded
NULL: something you can't do with the Win32 API, but something that can be 
created in your registry (the undocumented, native NT API uses a count to 
specify name-lengths).  The resulting key cannot be modified or directly deleted 
via Regedit.  (see
for a sample program that creates such a key and then deletes it; or see
to download a utility to scan for such keys; I found a few hidden in
my Security Hive -- would be nice to know where they came from...*ahem*

	Aside from the verbose documentation on the subject, you can use
the attached, =non-harmful= (AFAIK) example that _is_ deletable via Regedit,
but is not valid Unicode:
	From an character point of view, the file looks like the
following in vim:
  01	Windows Registry Editor Version 5.00
  03	[HKEY_CURRENT_USER\AppEvents\Schemes\Names\aaa<fffe>Ã~]
  04	@="bogon"
	The hex-dumped value of the "name" at the end of line 3 is
000000a0  6d 00 65 00 73 00 5c 00  61 00 61 00 61 00 fe ff  |m.e.s.\.a.a.a...|
000000b0  d8 00 5d 00 0d 00 0a 00  40 00 3d 00 22 00 62 00  |..].....@.=.".b.|

	Note the "illegal" Unicode characters @ 0x00ae-0x00b1.  I've
attached the file as a .reg and .zip file in case the .reg gets
mangled by a mailer.

	You can delete the key using regedit (displayed as "aaaØ" on my

	Did that shine sufficient light on the matter? :-)

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About