Front page | perl.perl5.porters |
Postings from March 2014
Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+
Thread Previous
|
Thread Next
From:
Victor Efimov
Date:
March 26, 2014 22:06
Subject:
Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+
Message ID:
CAF7QZD7kjyi_KOseBAMvMFVibWsjUBARCyNUyfyqv6W9y0QnUQ@mail.gmail.com
2014-03-27 1:41 GMT+04:00 Karl Williamson via RT <perlbug-followup@perl.org>:
> I looked at https://github.com/petdance/ack2/issues/367
> which shows that ack is broken by the 5.19.2 change.
>
> If you look at that link, you'll see that the russian comes out fine, but with a warning that didn't use to be there; the french is broken.
>
> What is happening is that ack treats everything as bytes, and so everything just worked. STDERR is opened as a byte-oriented file, and if $! actually did contain UTF-8, it wasn't marked as such, and its component bytes were output as-is, so that if in fact the terminal is expecting UTF-8, they come out looking like UTF-8 to it, and everything held together. (Garbage would ensue if the terminal wasn't expecting the encoding that $! is in; I haven't checked, but my guess is that the grep output is also output as-is, so if the file encodings differ from the terminal expectation, that garbage could be printed; but in practice I doubt that this is a problem.)
>
> What the 5.19 change did effectively is to make the stringification of "$!" obey "use bytes". Most code isn't in bytes' scope, so the UTF-8 flag gets turned on if appropriate.
>
> Perl's do_print() function checks if the stream is listed as UTF-8 or not. The string being output is converted to the stream's encoding if necessary and possible. If not possible, things are just output as-is, possibly with warnings. In ack's case the stream never is (AFAIK) UTF-8. Starting in 5.19.2+, the message can be marked as UTF-8, and so tries to get converted to the non-UTF-8 stream. This is impossible in Russian, so the bytes are output as-is, with a warning. Since the terminal really is UTF-8, they display correctly. But it is possible to convert the French text, as all the characters in the message in the bug report are Latin1. So do_print() does this, but since the terminal's encoding doesn't match what ack thinks it is, the non-ascii characters come out as garbage.
yes agree. anyway warnings are bad. and broken latin1 bad too.
>
> Note that ack has some of its messages hard-coded in English. For example, it does a -e on the file name, and outputs English-only if it doesn't exist. rjbs has pointed out to me privately that typical uses of $! are of the form
>
> die "my message in English: $!"
Right, usually "my message in English" indeed is in English because
authors don't bother with full localization and translations to all
languages, but for consistency it's better to see $! in locale's
language. Other programs usually show it in user language.
>
> I am not an ack user, but it appears to me that ack is like a filter which doesn't care about encodings. It is byte rather than character oriented. This seems to me to be an appropriate use of 'use bytes', and if ack did this, this bug would not arise.
I would disagree, they try to migrate to unicode
https://github.com/petdance/ack2/issues/120
https://github.com/petdance/ack2/issues/344
https://github.com/petdance/ack2/issues/350
https://github.com/petdance/ack2/issues/355
ack is searching _text_ using _perl regexps_ in text files. it even
ignore files detected as binary (by default, at least, in my
installation)
>
> My proposal to only use ASCII characters in error messages unless within 'use locale' would also fix this problem. All messages that print in Russian and some messages in French, would now appear in English, adding to the several that already print in English no matter what.
>
I am writing programs with correct use of modern Perl unicode now, but
never used 'use locale', seems it adds additional side effect to code?
Can there be special option for 'use locale' to not change anything at
all, except $! behaviour (in lexical scope) ?
also, can code without 'use locale' behave like 5.18 (i.e. not always
in English; bytes) ? and with 'use locale :errno_only' change $! to
return unicode character string.
> ---
> via perlbug: queue: perl5 status: open
> https://rt.perl.org/Ticket/Display.html?id=119499
Thread Previous
|
Thread Next