develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
Juerd Waalboer
Date:
February 5, 2007 14:33
Subject:
Re: Future Perl development
Message ID:
20070205223331.GE25362@c4.convolution.nl
Dr.Ruud skribis 2007-02-05 23:09 (+0100):
> perl -wle '
>   $s = substr "\x{100}\xFF", 1;
>   print length $s, ":", unpack "H*", $s;
> '

Note that normally, unpack "H*" on a unicode string (like your $s) is a
violation of proper separation. 

Of course, it's warranted when you really want to demonstrate the
internals like you have now. I'm just commenting for clarity.

My preferred way to show the internals is Devel::Peek::Dump:

    use Devel::Peek;

    $s = substr "\x{100}\xFF", 1;
    Dump $s;

Output:

    SV = PV(0x8149ae8) at 0x8149624
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x816a920 "\303\277"\0 [UTF8 "\x{ff}"]
      CUR = 2
      LEN = 4

This includes the individual bytes of the internal byte string: \303, \277 (and
\0); the characters in the Unicode string: \x{ff}.

Note that CUR is the length *in bytes*. After all, we're dumping the *internal*
values. The length in characters isn't calculated until it's needed, for better
performance. When it has been calculated, it is cached:

    # after length($s)
    SV = PVMG(0x8163bb0) at 0x8149624
      REFCNT = 1
      FLAGS = (SMG,POK,pPOK,UTF8)
      IV = 0
      NV = 0
      PV = 0x816a920 "\303\277"\0 [UTF8 "\x{ff}"]
      CUR = 2
      LEN = 4
      MAGIC = 0x816a8a0
        MG_VIRTUAL = &PL_vtbl_utf8
        MG_TYPE = PERL_MAGIC_utf8(w)
        MG_LEN = 1

MG_LEN is the length in characters.
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  <juerd@juerd.nl>  <http://juerd.nl/sig>
  convolution:     ict solutions and consultancy <sales@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About