develooper Front page | perl.perl6.users | Postings from October 2018

Re: I need unprintable regex help

Thread Previous | Thread Next
From:
ToddAndMargo via perl6-users
Date:
October 21, 2018 06:04
Subject:
Re: I need unprintable regex help
Message ID:
8a55b5eb-dd4a-1096-0836-d4429aa376e2@zoho.com
>> I'm not sure what you thought I was showing you on IRC last night, since 
>> I pointed this out multiple times.
>> 
>> On Sat, Oct 20, 2018 at 3:43 AM ToddAndMargo via perl6-users 
>> <perl6-users@perl.org <mailto:perl6-users@perl.org>> wrote:
>> 
>>     Hi All,
>> 
>>          my Str $CrLf   = chr(0x0d) ~ chr(0x0a);
>>          $String ~~ s:global/ $CrLf /\n/;
>> 
>>     How do I get rid of the extra $CrLf variable?
>> 
>>     Many thanks,
>>     -T

On 10/20/18 9:29 AM, Brandon Allbery wrote:
> The escape sequence \x allows you to embed characters by their code. 
> "\x0D\x0A" is the same as the variable.

So
     $String ~~ s:global/ \x0D\x0A /\n/;

?

I did not pick up on what you were saying last night.  I did not
realize that `\x` was not part of the `0D`.  I thought the
right way to say 0H0D was "0x0D" and that the "x" was part of
the syntax of the 0H0D.  And I could not figure what the "\"
was escaping.

A couple of the web sites I was reading download all sorts of
wild junk and UTF-8 coughed on them.  Reading then as UTF8-C8
only made things "much" worse.  a 1000+ line web page converted
to about 40 lines, filled with all sorts of wild unprintable
characters, and missed all the info I was looking for.

To add insult, "split" would not split on a single 0H0A line
terminator.  Weird because `\n` is 0H0A on my machine.

Reading the site as raw bytes (Buf) and doing my own conversion
fixed the issue:

sub AsciiToStr ( Buf $Ascii ) {
    # masks off bits above 0H7F
    # change 0H0D 0H0A to \n
    # change lone 0H0D to \n
    # return the corrected string

    my Str $String = "";
    my Str $CrLf   = chr(0x0d) ~ chr(0x0a);
    my Str $Cr     = chr(0x0d);

    for $Ascii[0..*] -> $I { $String ~= chr( Int( $I ) +& 0x7F ); }
    $String ~~ s:global/ $CrLf /\n/;
    $String ~~ s:global/ $Cr /\n/;

    # print( "AsciiToStr <" ~ $String ~ "\n" );

    return $String;
}

The mask worked perfectly.
    Int( $I ) +& 0x7F

Thank you all for the help last night!  You guys were awesome!
Hopefully here is few months I will be answering more questions
than asking.

-T

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About