develooper Front page | perl.perl5.porters | Postings from December 2010

What to do about character escapes outside the acceptable Unicode range?

Thread Next
From:
David Golden
Date:
December 25, 2010 19:25
Subject:
What to do about character escapes outside the acceptable Unicode range?
Message ID:
AANLkTimNPaL6b9t-rHP0a7o00k9LcrPPC8VdJ4GeQ1nV@mail.gmail.com
The UTF-8 layer discussion has me thinking about how chr() and
character escapes handle some of the same illegal Unicode values we
are discussing in the layers thread.  Currently, it seems that illegal
Unicode non-characters warn, but that characters outside the Unicode
range beyond 0x10FFFF do not.  E.g.

  $ perl5137 -C -wE 'say chr($_) for 0x10FFFC .. 0x110001'
  􏿼
  􏿽
  Unicode non-character 0x10fffe is illegal for interchange at -e line 1.
  ��
  Unicode non-character 0x10ffff is illegal for interchange at -e line 1.
  ��
  ����
  ����

I wonder if it would be better to have them warn as being outside the
Unicode codepoint range.

I also wonder about someone manually constructing an overlong
character sequence and then trying to output it.  Does that warn?
Should it?

-- David

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About