develooper Front page | perl.libwww | Postings from August 2001

Re: Changing URI::Escape default behavior

Thread Next
From:
Gisle Aas
Date:
August 20, 2001 10:27
Subject:
Re: Changing URI::Escape default behavior
Message ID:
lrg0am8wxi.fsf@caliper.ActiveState.com
[Cc'ed to libwww@perl.org and to TimB as I try to blame him :-)]

Ilmari Karonen <iltzu@sci.fi> writes:

> In comp.lang.perl.misc, there has been -- yet again -- heated discussion
> about the default behavior of URI::Escape.  Briefly, it all started when
> someone recommended the use of said module for composing a URI, without
> mentioning the need to use a non-default set of escaped characters.

Personally I never use the uri_escape() function for anything.  I
always use the URI objects as they always get the escaping correct
without me having to think again.  If I want custom escaping I always
use something like:

   s/([...])/$URI::Escape::escapes{$1}/g

I agree that the current uri_escape() default is a bit useless.

> Quoted below is part of one of my own posts in the thread (in reply to
> brian d foy).  Looking at it now, the tone could've been a little (okay,
> a lot) more polite, but I still stand by the actual point.
> 
>   What Bart really wants to say is that it doesn't escape the reserved
>   characters [;/?:@&=+$,].  This is broken, since according to RFC 2396
>   these characters must be escaped _except when used for their reserved
>   purpose_.
> 
>   Why is that broken, then?  Because if the input contains any reserved
>   characters that are not meant to be escaped, then the input must already
>   be past the stage where escaping should be done.
> 
>   There are exactly two meaningful classes of characters to escape: One is
>   [^A-Za-z0-9\-_.!~*'()], and the other is *no characters at all*.  The
>   former should be used on fragments of an URI before joining them, while
>   the latter should be (not) used if you ever get the temptation to escape
>   an already-composed URI.
> 
>   There's absolutely nothing HTTP- or CGI-specific about this.  This is
>   just basic RFC 2396 compliance.
> 
>   I still can't believe Gisle Aas got this wrong.

I was still young when this default was established :-)

Actually I think it was established by Tim Bunce at the time he hacked
on URI::URL.  See for instance
<http://www.ics.uci.edu/pub/websoft/libwww-perl/archive/1995h1/0105.html>.
I'm not entirely sure that the uri_escape() function was introduced by
him.  It might also have been a Martijn Koster thing.  All I know is
that this all happened a long time ago and that it was certainly not
me that came up with that default :-)

The uri_escape() in that version URI::URL was then later moved to
URI::Escape, but the original default $unsafe arg was kept.  I guess I
can be blamed for not changing the default at this point.

>                                                     I wonder if changing
>   the default character class would break more existing code than it would
>   fix.
> 
> This last paragraph is what I wanted to ask you directly.  Obviously we
> can talk about this on Usenet 'til the cows come home, but I'd really
> like to know if there's a reason for the current behavior, and whether
> it would be practical to change it at this point.

As it seems unlikely that any _working_ code could be using this
function it might be ok to simply change the default.  But this has
been this way for such a long time (more than 6 years) so I still
hesitate a bit.

I see 2 ways of changing the default:

   1) Remove % from the current set.  (URI.pm already considers,
      % to be part of URIC, although this is a bit internal).
   2) Go with your suggestion: [^A-Za-z0-9\-_.!~*'()]

It looks like 1) is more likely to not break code, but perhaps 2) is a
more useful default.

Regards,
Gisle

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About