develooper Front page | perl.perl6.users | Postings from September 2018

Re: tip: that annoying character at the end

Thread Previous
From:
ToddAndMargo
Date:
September 15, 2018 02:43
Subject:
Re: tip: that annoying character at the end
Message ID:
ddb2a8dd-e130-71f6-f781-165ad5c3812c@zoho.com
On 09/14/2018 07:34 PM, Brad Gilbert wrote:
> On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo <ToddAndMargo@zoho.com> wrote:
>>
>>> On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo <ToddAndMargo@zoho.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> A tip to share.
>>>>
>>>> I work a lot with downloaded web pages.  I cut
>>>> out things like revision numbers and download
>>>> locations.
>>>>
>>>> One of the things that use to drive me a bit nuts was that
>>>> web pages can come with all kind of weird line terminators.
>>>> I'd wind up with a link location that bombed because
>>>> there was some weird unprintable character at the end.
>>>>
>>>> Now there are routines to chop off these kind of things,
>>>> but they don't always work, depending on what the weird
>>>> character is.
>>>>
>>>> What I had done in the past as to dump the page to a file
>>>> and use a hex editor to figure out what the weird character
>>>> was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
>>>> ones I can't remember.  They often came is combinations too.
>>>> Then cut the turkey out with a regex.  It was a lot of work.
>>>>
>>>> Now-a-days, it is easy.  I just get "greedy" (chuckle).
>>>> I always know what end of the string should be: .zip,
>>>> .exe, .rpm, etc..  So
>>>>
>>>>       $Str ~~ s/ ".zip"  .* /.zip/;
>>>>
>>>>       $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
>>>> "<$x>";'
>>>>       <abc.zip>
>>>>
>>>> Problem solved.  And it doesn't care what the weird character(s)
>>>> at the end is/are.
>>>>
>>>> :-)
>>>>
>>>> Hope this helps someone else.  Thank you for all the
>>>> help you guys have given me!
>>>>
>>>> -T
>>
>>
>> On 09/14/2018 05:43 PM, Brad Gilbert wrote:
>>   > You can just remove the control characters
>>   >
>>   >     my $x="abc.zip"~chr(7)~chr(138);
>>   >     $x .= subst(/<:Cc>+ $/,'');
>>   >     say $x;
>>   >
>>   > Note that 13 is carriage return and 10 is newline
>>   >
>>   > If the only ending values are (13,10), 13, or 10
>>   > you can use .chomp to remove them
>>   >
>>   >     my $x="abc.zip"~chr(13)~chr(10);
>>   >     $x .= chomp;
>>   >     say $x;
>>
>> Thank you!
>>
>> "chomp" was on of those routines I could only get
>> to work "sometimes".  It depended on what weird character(s)
>> I was dealing with.
> 
> `chomp` removes a trailing newline.
> 
>>
>> Would you explain what you are doing with
>>      $x .= subst(/<:Cc>+ $/,'');
> 
> Cc is the Unicode general category for control characters
> 
>      > say 7.uniprop;
>      Cc
> 
>      > say 7.uniprop('General_Category')
>      Cc
> 
> You can match things by category
> 
> Like numbers
>      / <:N> /
> decimal numbers
>      / <:Nd> /
> letter numbers
>      / <:Nl> /
> other numbers
>      / <:No> /
> 
> letters
>      / <:L> /
> lowercase letters
>      / <:Ll> /
> uppercase letters
>      / <:Lu> /
> titlecase letters
>      / <:Lt> /
> 
> It is exactly the same as
> 
>     $x ~~ s/ <:Cc>+ $ //;
> 
> Originally I was just going to return the result of .subst()
> rather than mutating $x.
> 

Wow!  Thank you!

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Serious error.
All shortcuts have disappeared.
Screen. Mind. Both are blank.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About