develooper Front page | perl.perl5.porters | Postings from March 2021

Re: Perl 7: Fix string leaks?

Thread Previous | Thread Next
From:
=?UTF-8?Q?Salvador_Fandi=c3=b1o?=
Date:
March 31, 2021 17:55
Subject:
Re: Perl 7: Fix string leaks?
Message ID:
50c13837-d2ea-543f-763a-94609bb53bc0@gmail.com
On 31/3/21 5:02, Felipe Gasper wrote:
> 
> 
>> On Mar 30, 2021, at 4:48 PM, Salvador Fandiño <sfandino@gmail.com> wrote:
>>
>> On 30/3/21 20:53, Felipe Gasper wrote:
>>>> On Mar 30, 2021, at 12:16 PM, Salvador Fandiño <sfandino@gmail.com> wrote:
>>>>
>>>> Encoding/decoding should be done at the boundaries (syscall, XS, etc.) using sensible defaults and/or allowing the user to set them (as in PerlIO). That would IMO fix most of the problems.
>>> “Encoding at the boundaries” is essentially what Sys::Binmode achieves for POSIX OSes, FWIW.
>>
>> Yeah, that part is right! what is wrong is that it uses the wrong encoding (latin1) most of the time!!!
>>
>> Actually, I think it would be pretty easy to modify your module so that it gets the encoding from the environment or alternatively taking it as a parameter at least on UNIX and alikes:
>>
>>   use Sys::Binmode "latin5";
>>
>> Windows is a different story...
> 
> I’d be curious to see an implementation of what you have in mind.

I have forked your module on GitHub:

   https://github.com/salva/p5-Sys-Binmode

Note that this is not something to incorporate in your version, just a 
proof of concept for experimentation.

With my modified version you can say:

   use Sys::Binmode "latin3";

And it would encode data in the "latin3" encoding before doing any IO.

Also, if you say:

   use Sys::Binmode;

It inspects your environment (LC_TYPE, LC_ALL, etc.), and sets the 
encoding to utf8 or latin1 depending on what it founds there.

Thinks I have learnt:

1) It works quite well for file system operations, open, mkdir, etc.

2) The reverse layer needs to be done too. For instance, the return 
value from "readdir", should be decoded using the same encoding.

3) It doesn't make sense to do this encoding thing for operators that 
move data as "send", "syswrite", etc. For those, I think the current 
Sys::Binmode approach is the right thing, or as I already wrote in 
another mail, adding support for a new strict category "data_encoding".

4) Both things should be done inside Perl. Doing it at the OPs 
boundaries is quite hacky and anyway supporting this (both transparent 
encoding/decoding on file system operations and forcing binary data on 
the rest of the IO operations not covered by PerlIO) is a must!

5) Also, I have taken a look at some of the Windows code, and it is 
pretty clear to me than the only way to get this working on Windows is 
doing it at a lower level.


> “Latin-1” is kind of a funny term; as Perl uses it internally it’s not really an “encoding” so much as “just bytes”. So I disagree (of course) that Sys::Binmode uses “the wrong encoding” because the “encoding” that it gives you is whatever bytes the string stores. It’s the same “encoding” that SvPVbyte provides for XSUBs.

That's just how you decide to reason about bytes, encodings, characters, 
etc. There are several points of view and all of them can be valid and 
have their advantages.

But the issue I see here is that if I have a variable in Perl say, for 
instance, $fn then doing...

   open $f, ">", $fn;

should create a file with the correct name, without me, the programmer 
having to worry about encoding issues.

We are in 2021, almost every operating system released on the last 
decade uses some form of Unicode by default. So, I don't think that 
something that by default just uses an encoding (or a no-encoding) that 
doesn't match what your OS uses could be a good idea.

And I already understand that the programmer can call 
Encode::encode("utf8", $fn) before calling open. Actually, if he does 
that, he doesn't need your module at all!



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About