develooper Front page | perl.perl6.users | Postings from April 2020

Re: Using slurp to read in a utf16 file

Thread Previous
From:
Joseph Brenner
Date:
April 27, 2020 01:05
Subject:
Re: Using slurp to read in a utf16 file
Message ID:
CAFfgvXUDB72sZf0GoneukE=8iupkYe=q++EPgP+iG0bPRAmO=w@mail.gmail.com
To expand on the point a bit, doing exactly the same spurt/slurp works
with "utf8", but doing it with "utf16" fails to read the text back in:

{
    my $unichar_str =    # ሀⶀ䷼ꪪⲤⲎ
       "\x[1200]\x[2D80]\x[4DFC]\x[AAAA]\x[2CA4]\x[2C8E]";

    my $file = "/tmp/stuff_in_utf8.txt";
    my $fh = $file.IO.open( :w, :enc("utf8") );
    spurt $fh, $unichar_str;

    my $contents = slurp( $file, :enc("utf8") );
    my $huh = $contents.gist;
    say "contents: $contents";
    say "length: ", $contents.chars;
}

{
    my $unichar_str =    # ሀⶀ䷼ꪪⲤⲎ
       "\x[1200]\x[2D80]\x[4DFC]\x[AAAA]\x[2CA4]\x[2C8E]";

    my $file = "/tmp/stuff_in_utf16.txt";
    my $fh = $file.IO.open( :w, :enc("utf16") );
    spurt $fh, $unichar_str;

    my $contents = slurp( $file, :enc("utf16") );
    my $huh = $contents.gist;
    say "contents: $contents";                #  contents:
    say "length: ", $contents.chars;        # 0
}


The output:
   contents: ሀⶀ䷼ꪪⲤⲎ
   length: 6
   contents:
   length: 0

The file definitely has something in it, though:

wc /tmp/stuff_in_utf16.txt
  0  1 14 /tmp/stuff_in_utf16.txt
cat /tmp/stuff_in_utf16.txt
     \377\376^@^R\200-\374M\252\252\244,\216,



On 4/26/20, Joseph Brenner <doomvox@gmail.com> wrote:
> Looking at the documentation for slurp, it looks as though there's a
> convenient "enc" option you can use if you're not reading utf8 files.
> So I thought this would work:
>
>    my $contents = slurp $file, enc => "utf16";
>
> It's not doing what I expected... Raku acts like there's nothing in
> $contents.
>
> Here's the test code I've been using:
>
> # ሀⶀ䷼ꪪⲤⲎ
> my $unichar_str =
>      "\x[1200]\x[2D80]\x[4DFC]\x[AAAA]\x[2CA4]\x[2C8E]";
>
> my $file = "/home/doom/tmp/stuff_in_utf16.txt";
> my $fh = $file.IO.open( :w, :enc("utf16") );
> spurt $fh, $unichar_str;
>
> # read entire file as utf16 Str
> my $contents = slurp $file, enc => "utf16";
> my $huh = $contents.gist;
> say "contents: $contents";  #  contents:
> say $contents.elems;        # 1
>

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About