[perl #42167] Text::Tabs fails to expand correctly in the presence of UTF8 characters

Robin Norwood
March 29, 2007 23:58
[perl #42167] Text::Tabs fails to expand correctly in the presence of UTF8 characters
It appears that Text::Tabs doesn't expand tabs properly when the tab comes after UTF8 characters.

perl -CS -MText::Tabs -e 'print expand("\taa\t.\n\t\x{010a}\x{010a}\t."), "\n"'

        aa      .
        %GĊĊ%@       .

My text editor/mailer may munge the UTF8 improperly - essentially the
line with two UTF8 characters gets an extra space before the dot when
run through Text::Tabs::expand.

This appears to be also broken in 5.9.4.

The bug is in Red Hat's bugzilla as:

Incidentally, the problem appears to stem from the pos() function not
counting UTF8 characters correctly - I haven't delved into the source
deeply enough to figure out why, though.  There is an alternative
version of expand() in the source file (after __END__) that
does not have this bug.  Since the repo browser at seems broken right
now ('no space left on device' errors - reported to the email address
listed on that page), I don't have access to the annotations/history
of that file to see why.

