develooper Front page | perl.perl6.users | Postings from April 2020

readchars, seek back, and readchars again

Thread Next
From:
Joseph Brenner
Date:
April 24, 2020 18:34
Subject:
readchars, seek back, and readchars again
Message ID:
CAFfgvXVQWVmmM-WJwz2X2L09Zop4zqdibFmhZD5YPTz9+TzfsA@mail.gmail.com
I thought that doing a readchars on a filehandle, seeking backwards
the width of the char in bytes and then doing another read
would always get the same character.  That works for ascii-range
characters (1-byte in utf-8 encoding) but not multi-byte "wide"
characters (commonly 3-bytes in utf-8).

The question then, is why do I need a $nudge of 3 for wide chars, but
not ascii-range ones?

use v6;
use Test;

my $tmpdir = IO::Spec::Unix.tmpdir;
my $file = "$tmpdir/scratch_file.txt";
my $unichar_str = "\x[1200]\x[2D80]\x[4DFC]\x[AAAA]\x[2CA4]\x[2C8E]";  # ሀⶀ䷼ꪪⲤⲎ
my $ascii_str =   "ABCDEFGHI";

subtest {
    my $nudge = 3;
    test_read_and_read_again($unichar_str, $file, $nudge);
}, "Wide unicode chars: $unichar_str";

subtest {
    my $nudge = 0;
    test_read_and_read_again($ascii_str, $file, $nudge);
}, "Ascii-range chars: $ascii_str";

# write given string to file, then read the third character twice and check
sub test_read_and_read_again($str, $file, $nudge = 0) {
    spurt $file, $str;
    my $fh = $file.IO.open;
    $fh.readchars(2);  # skip a few
    my $chr_1 =      $fh.readchars(1);
    my $width = $chr_1.encode('UTF-8').bytes;  # for our purposes, always 1 or 3
    my $step_back = $width + $nudge;
    $fh.seek: -$step_back, SeekFromCurrent;
    my $chr_2 =      $fh.readchars(1);
    is( $chr_1, $chr_2,
        "read, seek back, and read again gets same char with nudge of $nudge" );
}

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About