develooper Front page | perl.perl5.porters | Postings from January 2006

Re: including Win32 functions in Core (was Re: [PATCH] Re: replacing"inuse" Win files...)

Thread Previous | Thread Next
From:
Linda W
Date:
January 31, 2006 18:48
Subject:
Re: including Win32 functions in Core (was Re: [PATCH] Re: replacing"inuse" Win files...)
Message ID:
43E0215D.1050207@tlinx.org
Glenn Linderman wrote:
> On approximately 1/31/2006 12:06 PM, came the following characters from 
> the keyboard of Linda W:
> 
>>     The NT-based registry uses 16-bit binary "blobs" (wchar_t) that
>> are not, *strictly*, interpretable as UTF-16, UCS2 or any standard
>> character set.  As such, they aren't suitable for being converted to a
>> printable ASCII or Unicode string that can be manipulated with Perl's
>> standard string functions.
> 
> Could you elucidate this "*strictly*" comment, or provide a doc ref that 
> explains it further?  I was certainly under the impression that registry 
> keys were UCS-2 (and I haven't looked to see how the conversions to text 
> are handled by the perl modules or the Win32 APIs they call).
---
	Ah..."Strictly"...well, MS certainly would like you to think they
use UCS2, and their documentation calls it that (as well as "Unicode"), but
a 16-bit, "wide-char" or "wchar" value doesn't not a Unicode character make! :-)

	Some 16-bit values are used as lead-ins for "surrogates" (using
two 16-bit values, like the pair {0xD800,0xDC00), and some like {0xFFFE,0}
are simply not legal Unicode characters.

	The closest thing I've found to a complete answer was documentation
describing the structure of the registry:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/structure_of_the_registry.asp

"Key names cannot include a backslash (\), but any other printable or
unprintable character can be used. "

	When the NT-Unicode level first came out, I don't _believe_ there were
any multi-wchar extensions in Unicode.  MS was an early adopter -- they didn't
put "unicode-interpretation" into their low-level API's.

	As an even more UGKY point, keynames/valuenames can contain an embedded
NULL: something you can't do with the Win32 API, but something that can be 
created in your registry (the undocumented, native NT API uses a count to 
specify name-lengths).  The resulting key cannot be modified or directly deleted 
via Regedit.  (see
	http://www.sysinternals.com/Information/TipsAndTrivia.html#HiddenKeys
for a sample program that creates such a key and then deletes it; or see
	http://www.sysinternals.com/Utilities/RegDelNull.html
to download a utility to scan for such keys; I found a few hidden in
my Security Hive -- would be nice to know where they came from...*ahem*
(sigh).

	Aside from the verbose documentation on the subject, you can use
the attached, =non-harmful= (AFAIK) example that _is_ deletable via Regedit,
but is not valid Unicode:
	From an character point of view, the file looks like the
following in vim:
line#--|
  01	Windows Registry Editor Version 5.00
  02
  03	[HKEY_CURRENT_USER\AppEvents\Schemes\Names\aaa<fffe>Ã~]
  04	@="bogon"
-------|
	The hex-dumped value of the "name" at the end of line 3 is
000000a0  6d 00 65 00 73 00 5c 00  61 00 61 00 61 00 fe ff  |m.e.s.\.a.a.a...|
000000b0  d8 00 5d 00 0d 00 0a 00  40 00 3d 00 22 00 62 00  |..].....@.=.".b.|

	Note the "illegal" Unicode characters @ 0x00ae-0x00b1.  I've
attached the file as a .reg and .zip file in case the .reg gets
mangled by a mailer.

	You can delete the key using regedit (displayed as "aaaØ" on my
system).

	Did that shine sufficient light on the matter? :-)
Linda







Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About