Front page | perl.perl5.porters |
Postings from March 2021
Re: Perl 7: Fix string leaks?
From: Salvador Fandiño
March 31, 2021 17:55
Re: Perl 7: Fix string leaks?
Message ID: firstname.lastname@example.org
On 31/3/21 5:02, Felipe Gasper wrote:
>> On Mar 30, 2021, at 4:48 PM, Salvador Fandiño <email@example.com> wrote:
>> On 30/3/21 20:53, Felipe Gasper wrote:
>>>> On Mar 30, 2021, at 12:16 PM, Salvador Fandiño <firstname.lastname@example.org> wrote:
>>>> Encoding/decoding should be done at the boundaries (syscall, XS, etc.) using sensible defaults and/or allowing the user to set them (as in PerlIO). That would IMO fix most of the problems.
>>> “Encoding at the boundaries” is essentially what Sys::Binmode achieves for POSIX OSes, FWIW.
>> Yeah, that part is right! what is wrong is that it uses the wrong encoding (latin1) most of the time!!!
>> Actually, I think it would be pretty easy to modify your module so that it gets the encoding from the environment or alternatively taking it as a parameter at least on UNIX and alikes:
>> use Sys::Binmode "latin5";
>> Windows is a different story...
> I’d be curious to see an implementation of what you have in mind.
I have forked your module on GitHub:
Note that this is not something to incorporate in your version, just a
proof of concept for experimentation.
With my modified version you can say:
use Sys::Binmode "latin3";
And it would encode data in the "latin3" encoding before doing any IO.
Also, if you say:
It inspects your environment (LC_TYPE, LC_ALL, etc.), and sets the
encoding to utf8 or latin1 depending on what it founds there.
Thinks I have learnt:
1) It works quite well for file system operations, open, mkdir, etc.
2) The reverse layer needs to be done too. For instance, the return
value from "readdir", should be decoded using the same encoding.
3) It doesn't make sense to do this encoding thing for operators that
move data as "send", "syswrite", etc. For those, I think the current
Sys::Binmode approach is the right thing, or as I already wrote in
another mail, adding support for a new strict category "data_encoding".
4) Both things should be done inside Perl. Doing it at the OPs
boundaries is quite hacky and anyway supporting this (both transparent
encoding/decoding on file system operations and forcing binary data on
the rest of the IO operations not covered by PerlIO) is a must!
5) Also, I have taken a look at some of the Windows code, and it is
pretty clear to me than the only way to get this working on Windows is
doing it at a lower level.
> “Latin-1” is kind of a funny term; as Perl uses it internally it’s not really an “encoding” so much as “just bytes”. So I disagree (of course) that Sys::Binmode uses “the wrong encoding” because the “encoding” that it gives you is whatever bytes the string stores. It’s the same “encoding” that SvPVbyte provides for XSUBs.
That's just how you decide to reason about bytes, encodings, characters,
etc. There are several points of view and all of them can be valid and
have their advantages.
But the issue I see here is that if I have a variable in Perl say, for
instance, $fn then doing...
open $f, ">", $fn;
should create a file with the correct name, without me, the programmer
having to worry about encoding issues.
We are in 2021, almost every operating system released on the last
decade uses some form of Unicode by default. So, I don't think that
something that by default just uses an encoding (or a no-encoding) that
doesn't match what your OS uses could be a good idea.
And I already understand that the programmer can call
Encode::encode("utf8", $fn) before calling open. Actually, if he does
that, he doesn't need your module at all!