On Thu, Jul 1, 2021 at 12:46 PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote: > Consider > > printf "%-40s : %s\n", $_->@* for @rows; > > The intention is to print a nice neat table on the terminal. > > This works fine in ASCII but gets all confused if any ->[0] element > contains Unicode text. While Perl will count in Unicode codepoints, > this won't help if there are combining chars (because combining chars > count as codepoints but do not consume terminal columns), or if there > are any emoji or other double-width characters (because these single > graphemes count as two columns). > > I propose a new printf flag, perhaps `|`, to tell (s)printf to count > these strings by terminal width instead. Thus > > printf "%-|40s : %s\n", $_->@* for @rows; > > would now print a neat table even in the presence of Weird Unicode. > As mentioned on IRC, I think it would be nice to have more grapheme-aware capability in core; right now the only grapheme-aware functionality I know of is the \X regular expression matcher which matches a single grapheme (and more manual stuff using Unicode::UCD). There is one potential problem here: you normally need to encode characters to bytes in order to print them. The grapheme determination would need to happen before encoding. This would work out if you're printing to a handle with an encoding layer, but probably cause confusion in the usual case. -DanThread Previous | Thread Next