On Sun, Sep 21, 2008 at 08:36:32AM +0200, Gisle Aas wrote:
> The issue with dropped chars has been fixed so I don't worry about
> that. Just upgrade the URI module.
>
> The remaining issue is if $url->query_form should accept Unicode data
> and automatically UTF-8 encode it as it does now. When I accepted
> that patch I though it would be harmless as this provide a convenience
> for some at the same time as it does not change anything for users
> that properly encode their data before passing it to this API. What's
> problematic is that this strengthens the idea that the UTF-8 flag has
> semantic meaning at the Perl level. Strings with chars in the range
> 128-255 behave differently depending on the internal representation.
> I'm not happy about that. It's certainly not my idea of a sane
> Unicode model.
>
> To me that leaves 2 options; either make the URI API strict and only
> accept args that are bytes (strings that can be utf8::downgraded) or
> just live with the ugliness of inconsistent Unicode model and try to
> document the issues better over time. I'm leaning towards the later.
Sorry, kind of got stuck behind work here.
So, in my situation I need to post some utf8 characters. The service
I'm using requires an ?encoding=utf8 query parameter to say what
encoding the text is encoded in. The post doesn't include
a charset:
Content-Type: application/x-www-form-urlencoded
So it seems the server needs to be explicitly told.
The problem I had was if I passed in a character string (utf8 flag on)
then the url-encoding process dropped chars. You say that has been
fixed. I fixed on my side by simply calling encode_utf8 to convert my
character string into octets. Then all octets were url-encoded and
passed to the server and all works fine.
Now, here's my question. Could I pass in any byte (octet) string and
have it url-encoded? Do the url-encoded post parameters have to be of
a given character encoding or is that just an agreement between the
sender and receiver?
That is, can I encode my character string into any character
encoding and send it url-encoded? Then as long as the server
receiving the post knows how to decoded (using same encoding I used)
then it would be fine?
If that's the case then it would seem like query_param should die if
it receives any strings with the utf8 flag on. You can't encode_utf8
or utf8::downgrade because we don't know what (octet) encoding that
the sender and receiver agreed on.
--
Bill Moseley
moseley@hank.org
Sent from my iMutt
Thread Previous
|
Thread Next