perl.i18n http://www.nntp.perl.org/group/perl.i18n/ ... Copyright 1998-2014 perl.org Thu, 23 Oct 2014 15:51:59 +0000 ask@perl.org Re: Perl Unicode talks by Tom Christiansen by Lyle Thanks Michael, I&#39;d totally missed those. Excellent resource.<br/><br/><br/>Lyle<br/><br/>On 24/02/2012 15:08, Doran, Michael D wrote:<br/>&gt; A friend and fellow Perl coder (hat tip to Roy Zimmer) alerted me to this series of talks given by Tom Christiansen at OSCON 2011:<br/>&gt;<br/>&gt; 1. Perl Unicode Essentials<br/>&gt; 2. Unicode in Perl Regexes<br/>&gt; 3. Unicode Support Shootout: The Good, The Bad,&amp; the (mostly) Ugly<br/>&gt;<br/>&gt; http://training.perl.com/OSCON2011/index.html<br/>&gt; (resolves to http://98.245.80.27/tcpc/OSCON2011/index.html)<br/>&gt;<br/>&gt; If you weren&#39;t already aware of these, they are a treasure trove of in-depth and useful information.<br/>&gt;<br/>&gt; -- Michael<br/>&gt;<br/>&gt; # Michael Doran, Systems Librarian<br/>&gt; # University of Texas at Arlington<br/>&gt; # 817-272-5326 office<br/>&gt; # 817-688-1926 mobile<br/>&gt; # doran@uta.edu<br/>&gt; # http://rocky.uta.edu/doran/<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2012/02/msg227.html Fri, 24 Feb 2012 08:50:28 +0000 Perl Unicode talks by Tom Christiansen by Doran, Michael D A friend and fellow Perl coder (hat tip to Roy Zimmer) alerted me to this series of talks given by Tom Christiansen at OSCON 2011:<br/><br/> 1. Perl Unicode Essentials<br/> 2. Unicode in Perl Regexes<br/> 3. Unicode Support Shootout: The Good, The Bad, &amp; the (mostly) Ugly<br/><br/>http://training.perl.com/OSCON2011/index.html<br/>(resolves to http://98.245.80.27/tcpc/OSCON2011/index.html)<br/><br/>If you weren&#39;t already aware of these, they are a treasure trove of in-depth and useful information.<br/><br/>-- Michael<br/><br/># Michael Doran, Systems Librarian<br/># University of Texas at Arlington<br/># 817-272-5326 office<br/># 817-688-1926 mobile<br/># doran@uta.edu<br/># http://rocky.uta.edu/doran/<br/><br/><br/><br/> http://www.nntp.perl.org/group/perl.i18n/2012/02/msg226.html Fri, 24 Feb 2012 07:08:56 +0000 Re: GB2312 Encoding and File Names by --[ UxBoD ]-- Resolved by encoding to UTF-8 once the decoding had been completed:<br/><br/>my $fname = encode(&#39;utf8&#39;, decode(&#39;MIME-EncWords&#39;, $head-&gt;recommended_filename));<br/>--<br/>Thanks, Phil<br/><br/>----- Original Message -----<br/>&gt; Through some help of the PerlMonks board I have decoded the file name<br/>&gt; correctly; but when you dump it does not match the physical file<br/>&gt; name as it is stored within the file system ie.<br/>&gt;<br/>&gt; MIME Header :<br/>&gt; =?gb2312?B?RFBNMjAwN2V4Y2hhbmdl64rgXcVj4F3P5NDej80uemlw?=<br/>&gt; Decoded : DPM2007exchange&#x96FB;&#x90F5;&#x8207;&#x90F5;&#x7BB1;&#x4FEE;&#x5FA9;.zip<br/>&gt; $VAR1 =<br/>&gt; &quot;DPM2007exchange\x{96fb}\x{90f5}\x{8207}\x{90f5}\x{7bb1}\x{4fee}\x{5fa9}.zip&quot;;<br/>&gt;<br/>&gt; so when one tries to compare to what is read from a directory listing<br/>&gt; you cannot match them together :( How do I get the decoded name to<br/>&gt; be as it is meant to be; as show above.<br/>&gt; --<br/>&gt; Thanks, Phil<br/>&gt;<br/>&gt; ----- Original Message -----<br/>&gt; &gt; Just a follow up for some help on this problem. I appear to be able<br/>&gt; &gt; to decode Simplified Chinese okay but Tradional Chinese is somewhat<br/>&gt; &gt; more difficult. I have the file name MIME entity:<br/>&gt; &gt;<br/>&gt; &gt; =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=<br/>&gt; &gt;<br/>&gt; &gt; which should decode to:<br/>&gt; &gt;<br/>&gt; &gt; DPM2007exchange&#x96FB;&#x90F5;&#x8207;&#x90F5;&#x7BB1;&#x4FEE;&#x5FA9;.zip<br/>&gt; &gt;<br/>&gt; &gt; but when I try and decode that name in Perl it comes out as:<br/>&gt; &gt;<br/>&gt; &gt; DPM2007exchange&#xFFFD;&#xFFFD;&#xFFFD;]&#xFFFD;c&#xFFFD;]&#x7BB1;&#x4FEE;&#xFFFD;&#xFFFD;.zip<br/>&gt; &gt;<br/>&gt; &gt; I have installed the Encode::HanExtra module but even with that it<br/>&gt; &gt; is<br/>&gt; &gt; still not showing correctly. Am I missing some other type of module<br/>&gt; &gt; ?<br/>&gt; &gt; --<br/>&gt; &gt; Thanks, Phil<br/>&gt; &gt;<br/>&gt; &gt; ----- Original Message -----<br/>&gt; &gt; &gt; Hello all,<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; I do hope I am in the right place for some help! I am working on<br/>&gt; &gt; &gt; a<br/>&gt; &gt; &gt; project that requires email attachments to be extracted to the<br/>&gt; &gt; &gt; file<br/>&gt; &gt; &gt; system. All was working great until one of our kind testers tried<br/>&gt; &gt; &gt; with normal and simplified Chinese; where I ended up with files<br/>&gt; &gt; &gt; of<br/>&gt; &gt; &gt; the name ?????.txt.<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; Am using the module MIME::Parser to extract the files and after<br/>&gt; &gt; &gt; some<br/>&gt; &gt; &gt; great help from the developer I have realized that one need to<br/>&gt; &gt; &gt; override a method in MIME::Parser::Filer so that the correct file<br/>&gt; &gt; &gt; names are generated.<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; One of the attachments in the test email is show below:<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; I have tried to use MIME::EncWords and MIME::Charset to extract<br/>&gt; &gt; &gt; the<br/>&gt; &gt; &gt; correct name from the MIME entity using:<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; my $fname = decode_mimewords($head-&gt;recommended_filename);<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; but this still does not work :( so I tried to compare what the<br/>&gt; &gt; &gt; file<br/>&gt; &gt; &gt; name looks like with the LANG with/and without UTF8<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; With LANG en_GB.UTF8<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; With LANG en_GB<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; 360&#xFFFD;?&#xFFFD;&#xFFFD;?&#xFFFD;&#xFFFD;??&#xFFFD;&#xFFFD;?-12-01-Chi Simp.txt<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; Now this is what happens when I extract the file with my new<br/>&gt; &gt; &gt; method:<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; With LANG en_GB<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; 360&#xFFFD;&#xFFFD;&#xFFFD;&#x17C;&#xFFFD;&#xFFFD;&#xFFFD;-12-01-Chi Simp.txt<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; With LANG en_GB.UTF8<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; 360???&#x17C;???-12-01-Chi Simp.txt<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; The MIME file name appears as<br/>&gt; &gt; &gt; ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; This is not may area of expertise so reaching out to you for some<br/>&gt; &gt; &gt; help. How can one extract the file name from an email and have it<br/>&gt; &gt; &gt; reflect its really Chinese name ? Hope this make sense!<br/>&gt; &gt; &gt; --<br/>&gt; &gt; &gt; Thanks, Phil<br/>&gt; &gt; &gt;<br/>&gt; &gt;<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2011/11/msg225.html Tue, 22 Nov 2011 00:28:08 +0000 Re: GB2312 Encoding and File Names by --[ UxBoD ]-- Through some help of the PerlMonks board I have decoded the file name correctly; but when you dump it does not match the physical file name as it is stored within the file system ie.<br/><br/>MIME Header : =?gb2312?B?RFBNMjAwN2V4Y2hhbmdl64rgXcVj4F3P5NDej80uemlw?=<br/>Decoded : DPM2007exchange&#x96FB;&#x90F5;&#x8207;&#x90F5;&#x7BB1;&#x4FEE;&#x5FA9;.zip<br/>$VAR1 = &quot;DPM2007exchange\x{96fb}\x{90f5}\x{8207}\x{90f5}\x{7bb1}\x{4fee}\x{5fa9}.zip&quot;;<br/><br/>so when one tries to compare to what is read from a directory listing you cannot match them together :( How do I get the decoded name to be as it is meant to be; as show above.<br/>--<br/>Thanks, Phil<br/><br/>----- Original Message -----<br/>&gt; Just a follow up for some help on this problem. I appear to be able<br/>&gt; to decode Simplified Chinese okay but Tradional Chinese is somewhat<br/>&gt; more difficult. I have the file name MIME entity:<br/>&gt;<br/>&gt; =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=<br/>&gt;<br/>&gt; which should decode to:<br/>&gt;<br/>&gt; DPM2007exchange&#x96FB;&#x90F5;&#x8207;&#x90F5;&#x7BB1;&#x4FEE;&#x5FA9;.zip<br/>&gt;<br/>&gt; but when I try and decode that name in Perl it comes out as:<br/>&gt;<br/>&gt; DPM2007exchange&#xFFFD;&#xFFFD;&#xFFFD;]&#xFFFD;c&#xFFFD;]&#x7BB1;&#x4FEE;&#xFFFD;&#xFFFD;.zip<br/>&gt;<br/>&gt; I have installed the Encode::HanExtra module but even with that it is<br/>&gt; still not showing correctly. Am I missing some other type of module<br/>&gt; ?<br/>&gt; --<br/>&gt; Thanks, Phil<br/>&gt;<br/>&gt; ----- Original Message -----<br/>&gt; &gt; Hello all,<br/>&gt; &gt;<br/>&gt; &gt; I do hope I am in the right place for some help! I am working on a<br/>&gt; &gt; project that requires email attachments to be extracted to the file<br/>&gt; &gt; system. All was working great until one of our kind testers tried<br/>&gt; &gt; with normal and simplified Chinese; where I ended up with files of<br/>&gt; &gt; the name ?????.txt.<br/>&gt; &gt;<br/>&gt; &gt; Am using the module MIME::Parser to extract the files and after<br/>&gt; &gt; some<br/>&gt; &gt; great help from the developer I have realized that one need to<br/>&gt; &gt; override a method in MIME::Parser::Filer so that the correct file<br/>&gt; &gt; names are generated.<br/>&gt; &gt;<br/>&gt; &gt; One of the attachments in the test email is show below:<br/>&gt; &gt;<br/>&gt; &gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt; &gt;<br/>&gt; &gt; I have tried to use MIME::EncWords and MIME::Charset to extract the<br/>&gt; &gt; correct name from the MIME entity using:<br/>&gt; &gt;<br/>&gt; &gt; my $fname = decode_mimewords($head-&gt;recommended_filename);<br/>&gt; &gt;<br/>&gt; &gt; but this still does not work :( so I tried to compare what the file<br/>&gt; &gt; name looks like with the LANG with/and without UTF8<br/>&gt; &gt;<br/>&gt; &gt; With LANG en_GB.UTF8<br/>&gt; &gt;<br/>&gt; &gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt; &gt;<br/>&gt; &gt; With LANG en_GB<br/>&gt; &gt;<br/>&gt; &gt; 360&#xFFFD;?&#xFFFD;&#xFFFD;?&#xFFFD;&#xFFFD;??&#xFFFD;&#xFFFD;?-12-01-Chi Simp.txt<br/>&gt; &gt;<br/>&gt; &gt; Now this is what happens when I extract the file with my new<br/>&gt; &gt; method:<br/>&gt; &gt;<br/>&gt; &gt; With LANG en_GB<br/>&gt; &gt;<br/>&gt; &gt; 360&#xFFFD;&#xFFFD;&#xFFFD;&#x17C;&#xFFFD;&#xFFFD;&#xFFFD;-12-01-Chi Simp.txt<br/>&gt; &gt;<br/>&gt; &gt; With LANG en_GB.UTF8<br/>&gt; &gt;<br/>&gt; &gt; 360???&#x17C;???-12-01-Chi Simp.txt<br/>&gt; &gt;<br/>&gt; &gt; The MIME file name appears as<br/>&gt; &gt; ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=<br/>&gt; &gt;<br/>&gt; &gt; This is not may area of expertise so reaching out to you for some<br/>&gt; &gt; help. How can one extract the file name from an email and have it<br/>&gt; &gt; reflect its really Chinese name ? Hope this make sense!<br/>&gt; &gt; --<br/>&gt; &gt; Thanks, Phil<br/>&gt; &gt;<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2011/11/msg224.html Mon, 21 Nov 2011 10:32:18 +0000 Re: GB2312 Encoding and File Names by --[ UxBoD ]-- Just a follow up for some help on this problem. I appear to be able to decode Simplified Chinese okay but Tradional Chinese is somewhat more difficult. I have the file name MIME entity:<br/><br/>=?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=<br/><br/>which should decode to:<br/><br/>DPM2007exchange&#x96FB;&#x90F5;&#x8207;&#x90F5;&#x7BB1;&#x4FEE;&#x5FA9;.zip<br/><br/>but when I try and decode that name in Perl it comes out as:<br/><br/>DPM2007exchange&#xFFFD;&#xFFFD;&#xFFFD;]&#xFFFD;c&#xFFFD;]&#x7BB1;&#x4FEE;&#xFFFD;&#xFFFD;.zip<br/><br/>I have installed the Encode::HanExtra module but even with that it is still not showing correctly. Am I missing some other type of module ?<br/>--<br/>Thanks, Phil<br/><br/>----- Original Message -----<br/>&gt; Hello all,<br/>&gt;<br/>&gt; I do hope I am in the right place for some help! I am working on a<br/>&gt; project that requires email attachments to be extracted to the file<br/>&gt; system. All was working great until one of our kind testers tried<br/>&gt; with normal and simplified Chinese; where I ended up with files of<br/>&gt; the name ?????.txt.<br/>&gt;<br/>&gt; Am using the module MIME::Parser to extract the files and after some<br/>&gt; great help from the developer I have realized that one need to<br/>&gt; override a method in MIME::Parser::Filer so that the correct file<br/>&gt; names are generated.<br/>&gt;<br/>&gt; One of the attachments in the test email is show below:<br/>&gt;<br/>&gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt;<br/>&gt; I have tried to use MIME::EncWords and MIME::Charset to extract the<br/>&gt; correct name from the MIME entity using:<br/>&gt;<br/>&gt; my $fname = decode_mimewords($head-&gt;recommended_filename);<br/>&gt;<br/>&gt; but this still does not work :( so I tried to compare what the file<br/>&gt; name looks like with the LANG with/and without UTF8<br/>&gt;<br/>&gt; With LANG en_GB.UTF8<br/>&gt;<br/>&gt; 360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/>&gt;<br/>&gt; With LANG en_GB<br/>&gt;<br/>&gt; 360&#xFFFD;?&#xFFFD;&#xFFFD;?&#xFFFD;&#xFFFD;??&#xFFFD;&#xFFFD;?-12-01-Chi Simp.txt<br/>&gt;<br/>&gt; Now this is what happens when I extract the file with my new method:<br/>&gt;<br/>&gt; With LANG en_GB<br/>&gt;<br/>&gt; 360&#xFFFD;&#xFFFD;&#xFFFD;&#x17C;&#xFFFD;&#xFFFD;&#xFFFD;-12-01-Chi Simp.txt<br/>&gt;<br/>&gt; With LANG en_GB.UTF8<br/>&gt;<br/>&gt; 360???&#x17C;???-12-01-Chi Simp.txt<br/>&gt;<br/>&gt; The MIME file name appears as<br/>&gt; ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=<br/>&gt;<br/>&gt; This is not may area of expertise so reaching out to you for some<br/>&gt; help. How can one extract the file name from an email and have it<br/>&gt; reflect its really Chinese name ? Hope this make sense!<br/>&gt; --<br/>&gt; Thanks, Phil<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2011/11/msg223.html Mon, 21 Nov 2011 03:44:55 +0000 GB2312 Encoding and File Names by --[ UxBoD ]-- Hello all,<br/><br/>I do hope I am in the right place for some help! I am working on a project that requires email attachments to be extracted to the file system. All was working great until one of our kind testers tried with normal and simplified Chinese; where I ended up with files of the name ?????.txt.<br/><br/>Am using the module MIME::Parser to extract the files and after some great help from the developer I have realized that one need to override a method in MIME::Parser::Filer so that the correct file names are generated.<br/><br/>One of the attachments in the test email is show below:<br/><br/>360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/><br/>I have tried to use MIME::EncWords and MIME::Charset to extract the correct name from the MIME entity using:<br/><br/>my $fname = decode_mimewords($head-&gt;recommended_filename);<br/><br/>but this still does not work :( so I tried to compare what the file name looks like with the LANG with/and without UTF8<br/><br/>With LANG en_GB.UTF8<br/><br/>360&#x65B0;&#x95FB;&#x76D1;&#x6D4B;-12-01-Chi Simp.txt<br/><br/>With LANG en_GB<br/><br/>360&#xFFFD;?&#xFFFD;&#xFFFD;?&#xFFFD;&#xFFFD;??&#xFFFD;&#xFFFD;?-12-01-Chi Simp.txt<br/><br/>Now this is what happens when I extract the file with my new method:<br/><br/>With LANG en_GB<br/><br/>360&#xFFFD;&#xFFFD;&#xFFFD;&#x17C;&#xFFFD;&#xFFFD;&#xFFFD;-12-01-Chi Simp.txt<br/><br/>With LANG en_GB.UTF8<br/><br/>360???&#x17C;???-12-01-Chi Simp.txt<br/><br/>The MIME file name appears as ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=<br/><br/>This is not may area of expertise so reaching out to you for some help. How can one extract the file name from an email and have it reflect its really Chinese name ? Hope this make sense!<br/>--<br/>Thanks, Phil<br/> http://www.nntp.perl.org/group/perl.i18n/2011/11/msg222.html Thu, 17 Nov 2011 10:20:36 +0000 Re: Web apps with Gettext? by imacat My websites, using the Locale::Maketext::Gettext framework:<br/>http://www.imacat.idv.tw/<br/>http://www.wov.idv.tw/<br/><br/>Websites I wrote for my ex-company with PHP, with the GNU gettext<br/>framework:<br/>http://www.pristine.com.tw/<br/>http://www.fftc.agnet.org/<br/>http://www.fitchratings.com.tw/<br/>http://law.epa.gov.tw/<br/><br/>On Tue, 9 Mar 2010 11:41:01 -0600<br/>&quot;Doran, Michael D&quot; &lt;doran@uta.edu&gt; wrote:<br/><br/>&gt; Over the years, I&#39;ve developed a couple of web applications with multilingual interfaces. I&#39;ve done these in a fairly simple (but robust) way -- the apps rely on language modules that include all the user interface text/strings already translated.<br/>&gt; <br/>&gt; I would be interested in seeing (or hearing about) examples of multilingual web apps that use a framework such as Gettext (or something similar). Particularly any apps for public or academic libraries. <br/>&gt; <br/>&gt; -- Michael<br/>&gt; <br/>&gt; # Michael Doran, Systems Librarian<br/>&gt; # University of Texas at Arlington<br/>&gt; # 817-272-5326 office<br/>&gt; # 817-688-1926 mobile<br/>&gt; # doran@uta.edu<br/>&gt; # http://rocky.uta.edu/doran/<br/><br/>--<br/>Best regards,<br/>imacat ^_*&#39; &lt;imacat@mail.imacat.idv.tw&gt;<br/>PGP Key: http://www.imacat.idv.tw/me/pgpkey.asc<br/><br/>&lt;&lt;Woman&#39;s Voice&gt;&gt; News: http://www.wov.idv.tw/<br/>Tavern IMACAT&#39;s: http://www.imacat.idv.tw/<br/>TLUG List Manager: http://lists.linux.org.tw/cgi-bin/mailman/listinfo/tlug<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2010/03/msg221.html Wed, 31 Mar 2010 09:32:15 +0000 Web apps with Gettext? by Doran, Michael D Over the years, I&#39;ve developed a couple of web applications with multilingual interfaces. I&#39;ve done these in a fairly simple (but robust) way -- the apps rely on language modules that include all the user interface text/strings already translated. <br/> <br/>I would be interested in seeing (or hearing about) examples of multilingual web apps that use a framework such as Gettext (or something similar). Particularly any apps for public or academic libraries. <br/> <br/>-- Michael <br/> <br/># Michael Doran, Systems Librarian <br/># University of Texas at Arlington <br/># 817-272-5326 office <br/># 817-688-1926 mobile <br/># doran@uta.edu <br/># http://rocky.uta.edu/doran/ <br/> <br/> http://www.nntp.perl.org/group/perl.i18n/2010/03/msg220.html Tue, 09 Mar 2010 09:42:18 +0000 Locale::Maketext::Gettext 1.27 Released by imacat Dear whoever is using Locale::Maketext::Gettext,<br/><br/> Locale::Maketext::Gettext 1.24 is released, after another year. Two<br/>test suite bugs are fixed related to the build system. You don&#39;t have<br/>to upgrade if you are already using Locale::Maketext::Gettext.<br/><br/> Please tell me if there is any problem. Your feedbacks are welcome. <br/>Thank you.<br/><br/>--<br/>Best regards,<br/>imacat ^_*&#39; &lt;imacat@mail.imacat.idv.tw&gt;<br/>PGP Key: http://www.imacat.idv.tw/me/pgpkey.asc<br/><br/>&lt;&lt;Woman&#39;s Voice&gt;&gt; News: http://www.wov.idv.tw/<br/>Tavern IMACAT&#39;s: http://www.imacat.idv.tw/<br/>TLUG List Manager: http://lists.linux.org.tw/cgi-bin/mailman/listinfo/tlug<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2009/04/msg219.html Mon, 27 Apr 2009 17:01:51 +0000 Re: Locale::Maketext::Gettext 1.24 Released by imacat Dear Lyle,<br/><br/> Sorry that I seems to miss this mail. ^^;<br/><br/>On Sun, 15 Feb 2009 18:09:12 +0000<br/>Lyle &lt;webmaster@cosmicperl.com&gt; wrote:<br/>&gt; Does this mean that Locale::Maketext::Lexicon::Gettext could also do <br/>&gt; with an update?<br/><br/> I do not understand what do you mean by &quot;do with an update&quot;. This<br/>release announcement was posted an year ago. Could you please explain<br/>it clearer?<br/><br/>--<br/>Best regards,<br/>imacat ^_*&#39; &lt;imacat@mail.imacat.idv.tw&gt;<br/>PGP Key: http://www.imacat.idv.tw/me/pgpkey.asc<br/><br/>&lt;&lt;Woman&#39;s Voice&gt;&gt; News: http://www.wov.idv.tw/<br/>Tavern IMACAT&#39;s: http://www.imacat.idv.tw/<br/>TLUG List Manager: http://lists.linux.org.tw/cgi-bin/mailman/listinfo/tlug<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2009/02/msg218.html Sat, 28 Feb 2009 09:31:52 +0000 Re: Locale::Maketext::Gettext 1.24 Released by Lyle Hi Imacat,<br/> Does this mean that Locale::Maketext::Lexicon::Gettext could also do <br/>with an update?<br/><br/><br/>Lyle<br/> http://www.nntp.perl.org/group/perl.i18n/2009/02/msg217.html Sun, 15 Feb 2009 10:09:35 +0000 Re: I18N plugin for CGI::Application in progress and questions on Perl I18N... by Jesse Vincent <br/><br/><br/>On Wed, Dec 31, 2008 at 09:16:57PM +0000, Lyle wrote:<br/>&gt; Hi All,<br/>&gt; I&#39;ve only just found this list. I&#39;m totally new to <br/>&gt; internationalization and have been learning all I can to write a plug-in <br/>&gt; for CGI::App. There is a blog as to my progress at:-<br/>&gt; http://perl.bristolbath.org/blog/lyle<br/>&gt; It&#39;s not much more than an adaptation of the Catalyst I18N plug-in.<br/>&gt; <br/>&gt; I&#39;ve struggled to find guides on I18N with Perl. Not much more than a <br/>&gt; couple of old guides from Audrey (6+yrs old) and the man page for the <br/>&gt; Catalyst plug-in.<br/><br/>Is there something you&#39;ve found lacking in Audrey&#39;s guides? We&#39;re still<br/>using the Locale::Maketext suite and fairly happy.<br/><br/><br/>&gt;<br/>&gt; Is there a Perl I18N Wiki? If not shouldn&#39;t there be? There doesn&#39;t seem <br/>&gt; to be a clear path for new people looking to do localization in Perl.<br/>&gt; <br/>Please use http://www.perlfoundation.org/perl5/index.cgi<br/><br/>&gt; I&#39;d be more than happy to setup a TWiki for this...<br/>&gt; <br/>&gt; Thoughts?<br/>&gt; <br/>&gt; <br/>&gt; Lyle<br/>&gt; <br/><br/>-- <br/> http://www.nntp.perl.org/group/perl.i18n/2009/01/msg216.html Thu, 01 Jan 2009 09:10:13 +0000 I18N plugin for CGI::Application in progress and questions on PerlI18N... by Lyle Hi All,<br/> I&#39;ve only just found this list. I&#39;m totally new to <br/>internationalization and have been learning all I can to write a plug-in <br/>for CGI::App. There is a blog as to my progress at:-<br/>http://perl.bristolbath.org/blog/lyle<br/>It&#39;s not much more than an adaptation of the Catalyst I18N plug-in.<br/><br/>I&#39;ve struggled to find guides on I18N with Perl. Not much more than a <br/>couple of old guides from Audrey (6+yrs old) and the man page for the <br/>Catalyst plug-in.<br/><br/>Is there a Perl I18N Wiki? If not shouldn&#39;t there be? There doesn&#39;t seem <br/>to be a clear path for new people looking to do localization in Perl.<br/><br/>I&#39;d be more than happy to setup a TWiki for this...<br/><br/>Thoughts?<br/><br/><br/>Lyle<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/12/msg215.html Wed, 31 Dec 2008 13:17:07 +0000 Printing non-ascii on the Win32 console by Bjoern Hoehrmann Hi,<br/><br/> Writing an example script for Win32::MultiLanguage that simply lists<br/>all supported code pages in the user&#39;s default language, I naturally got<br/>all the umlauts in the german output messed up in the console. Wondering<br/>how to fix that I thought there should be a simple way to say, e.g.,<br/><br/> binmode STDOUT =&gt; &#39;:encoding(:terminal)&#39; if -t STDOUT;<br/><br/>I only found the :locale sub-pragma in open.pm and encoding.pm and it<br/>seems it does not work with binmode at all and would also fail on Win32<br/>systems due to not setting the relevant environment variables. So I&#39;ve<br/>implemented a crude and simple solution like this:<br/><br/> use Encode;<br/> use Encode::Alias;<br/><br/> sub terminal_encoding {<br/> my $encoding;<br/><br/> if ($^O eq &#39;MSWin32&#39;) {<br/> eval {<br/> require Win32::API;<br/> $encoding = &quot;cp&quot; . Win32::API-&gt;new(&#39;kernel32&#39;,<br/> &#39;UINT GetConsoleOutputCP()&#39;)-&gt;Call();<br/> };<br/> } else {<br/> require encoding;<br/> $encoding = encoding::_get_locale_encoding();<br/> }<br/><br/> return $encoding;<br/> }<br/><br/> define_alias(qr/^:terminal$/ =&gt;<br/> __PACKAGE__ . &#39;-&gt;terminal_encoding()&#39;);<br/><br/>I then wondered about putting this into a module, but there seem too<br/>many problems with this at the moment, one obvious thing is calling the<br/>supposedly private _get_locale_encoding function to provide a fallback<br/>on non-Win32 systems, then there are some codepages that might be used<br/>as default somewhere that Encode supports but does not support the cp<br/>alias for it (I could not find a list of those in use, presumably only<br/>the three digit pages are default oem code pages and Encode supports the<br/>cp alias for the encodings it support it seems), needing a C compiler on<br/>Windows is kind of annoying (it would be nice if a Core module would ex-<br/>pose GetConsoleOutputCP), and error handling seems tricky.<br/><br/>For example, it seems not possible to control the Encode fallback beha-<br/>vior only for the one file handle in question, falling back to &quot;no<br/>encoding&quot; if the terminal encoding is unsupported is not possible either<br/>(unless someone makes a &#39;raw&#39; encoding I suppose), the encoding finder<br/>code cannot croak as it&#39;s running in a string eval, and I am not sure<br/>there is a good way to generate a warning if and only if the code doing<br/>the binmode has warnings turned on. So while this seemed elegant at<br/>first, I&#39;ve ditched the idea. Perhaps someone else is going to take this<br/>up.<br/><br/>regards,<br/>-- <br/>Bj&ouml;rn H&ouml;hrmann &middot; mailto:bjoern@hoehrmann.de &middot; http://bjoern.hoehrmann.de<br/>Weinh. Str. 22 &middot; Telefon: +49(0)621/4309674 &middot; http://www.bjoernsworld.de<br/>68309 Mannheim &middot; PGP Pub. KeyID: 0xA4357E78 &middot; http://www.websitedev.de/ <br/> http://www.nntp.perl.org/group/perl.i18n/2008/07/msg214.html Tue, 01 Jul 2008 14:47:19 +0000 Re: Stripping out Unicode combining characters (diacritics) - by Brad Baxter Just to throw this out there: you may be interested in Text::Unidecode<br/>(http://search.cpan.org/~sburke/Text-Unidecode-0.04/) if your ultimate<br/>goal is to try to represent a unicode character with its closest ascii<br/>(or perhaps I should say, &quot;romanized&quot;) equivalent.<br/><br/>-- Brad<br/><br/>On Wed, May 7, 2008 at 9:51 AM, Doran, Michael D &lt;doran@uta.edu&gt; wrote:<br/><br/>&gt; I received a number of helpful suggestions and solutions. The approach I<br/>&gt; decided to adopt in my larger script is to &#39;decode&#39; all the incoming form<br/>&gt; input as UTF-8 as well as the input from the database that I&#39;ll be matching<br/>&gt; the form input against. This seems to allow the &#39;\p{M}&#39; syntax to work as<br/>&gt; expected in a Perl regexp. In my test.cgi script for form input it would<br/>&gt; like like this:<br/>&gt;<br/>&gt; #!/usr/local/bin/perl<br/>&gt; use strict;<br/>&gt; use CGI;<br/>&gt; use Encode;<br/>&gt; my $query = CGI::new();<br/>&gt; my $search_term = decode(&#39;UTF-8&#39;,$query-&gt;param(&#39;text&#39;));<br/>&gt; my $sans_diacritics = $search_term;<br/>&gt; $sans_diacritics =~ s/\pM*//g;<br/>&gt; print qq(Content-type: text/plain; charset=utf-8<br/>&gt;<br/>&gt; search_term is $search_term<br/>&gt; sans_diacritics is $sans_diacritics<br/>&gt; );<br/>&gt; exit(0);<br/>&gt;<br/>&gt; I&#39;m slowly figuring out how to work with Unicode in my web scripts, but<br/>&gt; still have a lot to learn. Thanks for all the help. :-)<br/>&gt;<br/>&gt; -- Michael<br/>&gt;<br/>&gt; # Michael Doran, Systems Librarian<br/>&gt; # University of Texas at Arlington<br/>&gt; # 817-272-5326 office<br/>&gt; # 817-688-1926 mobile<br/>&gt; # doran@uta.edu<br/>&gt; # http://rocky.uta.edu/doran/<br/>&gt;<br/>&gt;<br/>&gt; &gt; -----Original Message-----<br/>&gt; &gt; From: Doran, Michael D [mailto:doran@uta.edu]<br/>&gt; &gt; Sent: Monday, May 05, 2008 7:27 PM<br/>&gt; &gt; To: perl-i18n@perl.org<br/>&gt; &gt; Cc: Perl4lib<br/>&gt; &gt; Subject: Stripping out Unicode combining characters (diacritics)<br/>&gt; &gt;<br/>&gt; &gt; I&#39;m trying to strip out combining diacritics from some form<br/>&gt; &gt; input using this code:<br/>&gt; &gt;<br/>&gt; &gt; &lt;head&gt;<br/>&gt; &gt; &lt;META http-equiv=&quot;Content-Type&quot; content=&quot;text/html;<br/>&gt; &gt; charset=UTF-8&quot;&gt; &lt;/head&gt; &lt;body&gt;<br/>&gt; &gt; &lt;form action=&quot;test.cgi&quot; accept-charset=&quot;UTF-8&quot; method=&quot;get&quot;&gt;<br/>&gt; &gt; &lt;input type=&quot;text&quot; name=&quot;text&quot; value=&quot;&quot; size=&quot;10&quot;&gt;<br/>&gt; &gt; &lt;input type=&quot;submit&quot; value=&quot;submit&quot;&gt;<br/>&gt; &gt; &lt;/form&gt;<br/>&gt; &gt; &lt;/body&gt;<br/>&gt; &gt; &lt;/html&gt;<br/>&gt; &gt;<br/>&gt; &gt; #!/usr/local/bin/perl<br/>&gt; &gt; use CGI;<br/>&gt; &gt; $query = CGI::new();<br/>&gt; &gt; $search_term = $query-&gt;param(&#39;text&#39;);<br/>&gt; &gt; $sans_diacritics = $search_term;<br/>&gt; &gt; $sans_diacritics =~ s/\p{M}*//g;<br/>&gt; &gt; #$sans_diacritics =~ s/o//g;<br/>&gt; &gt; print qq(Content-type: text/plain; charset=utf-8<br/>&gt; &gt;<br/>&gt; &gt; $sans_diacritics<br/>&gt; &gt; );<br/>&gt; &gt; exit(0);<br/>&gt; &gt;<br/>&gt; &gt;<br/>&gt; &gt; In the form, I&#39;m inputting the string &quot;Barto&#x301;k&quot; with the<br/>&gt; &gt; accented character being a base character (small Latin letter<br/>&gt; &gt; &quot;o&quot;) followed by a combining acute accent. However, when I<br/>&gt; &gt; print (to the web) $sans_diacritics, I get my input with no<br/>&gt; &gt; change -- the combining diacritic is still there. I know<br/>&gt; &gt; that my input is not a precomposed accented character,<br/>&gt; &gt; because I can strip out the base &quot;o&quot; and the combining accent<br/>&gt; &gt; either stands alone or jumps to another character [2].<br/>&gt; &gt;<br/>&gt; &gt; The &quot;\p{M}&quot; is a Unicode class name for the character class<br/>&gt; &gt; of Unicode &#39;marks&#39;, for example accent marks [1]. I&#39;ve tried<br/>&gt; &gt; these variations (and many others) and none seem to be doing<br/>&gt; &gt; what I want:<br/>&gt; &gt;<br/>&gt; &gt; $sans_diacritics =~ s#[\p{Mark}]*##g;<br/>&gt; &gt; $sans_diacritics =~ tr#[\p{InCombiningDiacriticalMarks}]##;<br/>&gt; &gt; $sans_diacritics =~ tr#[\p{M}]##;<br/>&gt; &gt; $sans_diacritics =~ s/\p{M}*//g;<br/>&gt; &gt; $sans_diacritics =~ s#[\p{M}]##g;<br/>&gt; &gt; $sans_diacritics =~ s#\x{0301}##g;<br/>&gt; &gt; $sans_diacritics =~ s#\x{006F}\x{0301}##g;<br/>&gt; &gt; $sans_diacritics =~ s#[\x{0300}-\x{036F}]*##g;<br/>&gt; &gt;<br/>&gt; &gt; I&#39;m pulling my hair out on this... so any help would be<br/>&gt; &gt; appreciated. If there&#39;s any other info I can provide, let me know.<br/>&gt; &gt;<br/>&gt; &gt; My Perl version is 5.8.8 and the script is running on a<br/>&gt; &gt; server running Solaris 9.<br/>&gt; &gt;<br/>&gt; &gt; -- Michael<br/>&gt; &gt;<br/>&gt; &gt; [1] per http://perldoc.perl.org/perlretut.html and other documentation<br/>&gt; &gt;<br/>&gt; &gt; [2] using $sans_diacritics =~ s/o//g;<br/>&gt; &gt;<br/>&gt; &gt; # Michael Doran, Systems Librarian<br/>&gt; &gt; # University of Texas at Arlington<br/>&gt; &gt; # 817-272-5326 office<br/>&gt; &gt; # 817-688-1926 mobile<br/>&gt; &gt; # doran@uta.edu<br/>&gt; &gt; # http://rocky.uta.edu/doran/<br/>&gt; &gt;<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg213.html Thu, 08 May 2008 04:15:55 +0000 RE: Stripping out Unicode combining characters (diacritics) - by Doran, Michael D I received a number of helpful suggestions and solutions. The approach I decided to adopt in my larger script is to &#39;decode&#39; all the incoming form input as UTF-8 as well as the input from the database that I&#39;ll be matching the form input against. This seems to allow the &#39;\p{M}&#39; syntax to work as expected in a Perl regexp. In my test.cgi script for form input it would like like this: <br/> <br/>#!/usr/local/bin/perl <br/>use strict; <br/>use CGI; <br/>use Encode; <br/>my $query = CGI::new(); <br/>my $search_term = decode(&#39;UTF-8&#39;,$query-&gt;param(&#39;text&#39;)); <br/>my $sans_diacritics = $search_term; <br/>$sans_diacritics =~ s/\pM*//g; <br/>print qq(Content-type: text/plain; charset=utf-8 <br/> <br/>search_term is $search_term <br/>sans_diacritics is $sans_diacritics <br/>); <br/>exit(0); <br/> <br/>I&#39;m slowly figuring out how to work with Unicode in my web scripts, but still have a lot to learn. Thanks for all the help. :-) <br/> <br/>-- Michael <br/> <br/># Michael Doran, Systems Librarian <br/># University of Texas at Arlington <br/># 817-272-5326 office <br/># 817-688-1926 mobile <br/># doran@uta.edu <br/># http://rocky.uta.edu/doran/ <br/> <br/> <br/>&gt; -----Original Message----- <br/>&gt; From: Doran, Michael D [mailto:doran@uta.edu] <br/>&gt; Sent: Monday, May 05, 2008 7:27 PM <br/>&gt; To: perl-i18n@perl.org <br/>&gt; Cc: Perl4lib <br/>&gt; Subject: Stripping out Unicode combining characters (diacritics) <br/>&gt; <br/>&gt; I&#39;m trying to strip out combining diacritics from some form <br/>&gt; input using this code: <br/>&gt; <br/>&gt; &lt;head&gt; <br/>&gt; &lt;META http-equiv=&quot;Content-Type&quot; content=&quot;text/html; <br/>&gt; charset=UTF-8&quot;&gt; &lt;/head&gt; &lt;body&gt; <br/>&gt; &lt;form action=&quot;test.cgi&quot; accept-charset=&quot;UTF-8&quot; method=&quot;get&quot;&gt; <br/>&gt; &lt;input type=&quot;text&quot; name=&quot;text&quot; value=&quot;&quot; size=&quot;10&quot;&gt; <br/>&gt; &lt;input type=&quot;submit&quot; value=&quot;submit&quot;&gt; <br/>&gt; &lt;/form&gt; <br/>&gt; &lt;/body&gt; <br/>&gt; &lt;/html&gt; <br/>&gt; <br/>&gt; #!/usr/local/bin/perl <br/>&gt; use CGI; <br/>&gt; $query = CGI::new(); <br/>&gt; $search_term = $query-&gt;param(&#39;text&#39;); <br/>&gt; $sans_diacritics = $search_term; <br/>&gt; $sans_diacritics =~ s/\p{M}*//g; <br/>&gt; #$sans_diacritics =~ s/o//g; <br/>&gt; print qq(Content-type: text/plain; charset=utf-8 <br/>&gt; <br/>&gt; $sans_diacritics <br/>&gt; ); <br/>&gt; exit(0); <br/>&gt; <br/>&gt; <br/>&gt; In the form, I&#39;m inputting the string &quot;Barto&Igrave;&#129;k&quot; with the <br/>&gt; accented character being a base character (small Latin letter <br/>&gt; &quot;o&quot;) followed by a combining acute accent. However, when I <br/>&gt; print (to the web) $sans_diacritics, I get my input with no <br/>&gt; change -- the combining diacritic is still there. I know <br/>&gt; that my input is not a precomposed accented character, <br/>&gt; because I can strip out the base &quot;o&quot; and the combining accent <br/>&gt; either stands alone or jumps to another character [2]. <br/>&gt; <br/>&gt; The &quot;\p{M}&quot; is a Unicode class name for the character class <br/>&gt; of Unicode &#39;marks&#39;, for example accent marks [1]. I&#39;ve tried <br/>&gt; these variations (and many others) and none seem to be doing <br/>&gt; what I want: <br/>&gt; <br/>&gt; $sans_diacritics =~ s#[\p{Mark}]*##g; <br/>&gt; $sans_diacritics =~ tr#[\p{InCombiningDiacriticalMarks}]##; <br/>&gt; $sans_diacritics =~ tr#[\p{M}]##; <br/>&gt; $sans_diacritics =~ s/\p{M}*//g; <br/>&gt; $sans_diacritics =~ s#[\p{M}]##g; <br/>&gt; $sans_diacritics =~ s#\x{0301}##g; <br/>&gt; $sans_diacritics =~ s#\x{006F}\x{0301}##g; <br/>&gt; $sans_diacritics =~ s#[\x{0300}-\x{036F}]*##g; <br/>&gt; <br/>&gt; I&#39;m pulling my hair out on this... so any help would be <br/>&gt; appreciated. If there&#39;s any other info I can provide, let me know. <br/>&gt; <br/>&gt; My Perl version is 5.8.8 and the script is running on a <br/>&gt; server running Solaris 9. <br/>&gt; <br/>&gt; -- Michael <br/>&gt; <br/>&gt; [1] per http://perldoc.perl.org/perlretut.html and other documentation <br/>&gt; <br/>&gt; [2] using $sans_diacritics =~ s/o//g; <br/>&gt; <br/>&gt; # Michael Doran, Systems Librarian <br/>&gt; # University of Texas at Arlington <br/>&gt; # 817-272-5326 office <br/>&gt; # 817-688-1926 mobile <br/>&gt; # doran@uta.edu <br/>&gt; # http://rocky.uta.edu/doran/ <br/>&gt; <br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg212.html Wed, 07 May 2008 06:51:11 +0000 Re: Stripping out Unicode combining characters (diacritics) by David Kaufman Hi Michael,<br/><br/>&quot;Doran, Michael D&quot; &lt;doran@uta.edu&gt; wrote:<br/><br/>&gt; I&#39;m trying to strip out combining diacritics from some form input using <br/>&gt; this code:<br/>&gt; [...]<br/>&gt; $sans_diacritics =~ s/\p{M}*//g;<br/><br/>I do it like this:<br/><br/>use Encode;<br/>use Unicode::Normalize qw(normalize);<br/><br/>my $ascii = encode(&#39;ascii&#39;, normalize(&#39;KD&#39;, $utf8), sub { $_[0]=&#39;&#39; });<br/><br/><br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg211.html Wed, 07 May 2008 03:34:12 +0000 RE: Stripping out Unicode combining characters (diacritics) by Doran, Michael D Hi Leif,<br/><br/>&gt; This is what I do. You can try that.<br/>&gt; See if it helps:<br/>&gt; <br/>&gt; Encode::_utf8_on($str); # &lt;&lt;&lt;<br/>&gt; $str =~ s/\pM*//g;<br/><br/>That works! I will gladly buy the beers Leif, should we ever meet in person.<br/><br/>&gt; I mean - have you for instance tried running your cgi scripts <br/>&gt; in tainted mode (-T)?<br/><br/>No, I do not run my CGI scripts in tainted mode (although I realize that I probably should). <br/><br/>Thanks (once again) for your help.<br/><br/>-- Michael<br/><br/># Michael Doran, Systems Librarian<br/># University of Texas at Arlington<br/># 817-272-5326 office<br/># 817-688-1926 mobile<br/># doran@uta.edu<br/># http://rocky.uta.edu/doran/<br/> <br/><br/>&gt; -----Original Message-----<br/>&gt; From: Leif Andersson [mailto:Leif.Andersson@sub.su.se] <br/>&gt; Sent: Tuesday, May 06, 2008 3:33 AM<br/>&gt; To: Doran, Michael D<br/>&gt; Subject: Re: Stripping out Unicode combining characters (diacritics)<br/>&gt; <br/>&gt; Oh, now I see your REAL question.<br/>&gt; <br/>&gt; This is what I do. You can try that.<br/>&gt; See if it helps:<br/>&gt; <br/>&gt; Encode::_utf8_on($str); # &lt;&lt;&lt;<br/>&gt; $str =~ s/\pM*//g;<br/>&gt; <br/>&gt; You are not the only one having problems with Unicode.<br/>&gt; Esp. in web programming it can be very confusing.<br/>&gt; <br/>&gt; I am quite surprised that there are not more discussions of this kind.<br/>&gt; Not even in the &quot;official&quot; channels.<br/>&gt; <br/>&gt; I mean - have you for instance tried running your cgi scripts <br/>&gt; in tainted mode (-T)?<br/>&gt; <br/>&gt; I had all my scripts set up that way. Before Unicode.<br/>&gt; But basic Unicode stuff became broken with -T enabled.<br/>&gt; Have they fixed that now?<br/>&gt; I have at least seen no mentioning of it.<br/>&gt; <br/>&gt; And screen scraping. If you want to mess around with <br/>&gt; javascript embedded in an HTML page, you may find that the <br/>&gt; content encoding is mixed. And Perl gets very confused <br/>&gt; getting mixed character encodings.<br/>&gt; And so do I.<br/>&gt; <br/>&gt; You may also have to deal with mixed encodings doing SQL <br/>&gt; against the Voyager database.<br/>&gt; <br/>&gt; What would we do if we could not fall back on &quot;use bytes&quot;<br/>&gt; every now and then! ;-)<br/>&gt; <br/>&gt; Leif<br/>&gt; <br/>&gt; ======================================<br/>&gt; Leif Andersson, Systems Librarian<br/>&gt; Stockholm University Library<br/>&gt; SE-106 91 Stockholm<br/>&gt; SWEDEN<br/>&gt; Phone : +46 8 162769<br/>&gt; Mobile: +46 70 6904281<br/>&gt; <br/>&gt; <br/>&gt; -----Ursprungligt meddelande-----<br/>&gt; Fr&aring;n: Doran, Michael D [mailto:doran@uta.edu]<br/>&gt; Skickat: den 6 maj 2008 04:13<br/>&gt; Till: Mike Rylander<br/>&gt; Kopia: perl-i18n@perl.org; Perl4lib<br/>&gt; &Auml;mne: RE: Stripping out Unicode combining characters (diacritics)<br/>&gt; <br/>&gt; Hi Mike,<br/>&gt; <br/>&gt; I appreciate the quick reply. I am familiar with the <br/>&gt; Unicode::Normalize module (and will also be using that), but <br/>&gt; I left it out of this question because it&#39;s not relevant to <br/>&gt; the problem I&#39;m currently trying to solve. The text I&#39;m <br/>&gt; trying to strip diacritics out of does not have precomposed <br/>&gt; accented characters.<br/>&gt; <br/>&gt; -- Michael<br/>&gt; <br/>&gt; # Michael Doran, Systems Librarian<br/>&gt; # University of Texas at Arlington<br/>&gt; # 817-272-5326 office<br/>&gt; # 817-688-1926 cell<br/>&gt; # doran@uta.edu<br/>&gt; # http://rocky.uta.edu/doran/<br/>&gt; <br/>&gt; <br/>&gt; <br/>&gt; -----Original Message-----<br/>&gt; From: Mike Rylander [mailto:mrylander@gmail.com]<br/>&gt; Sent: Mon 5/5/2008 8:52 PM<br/>&gt; To: Doran, Michael D<br/>&gt; Cc: perl-i18n@perl.org; Perl4lib<br/>&gt; Subject: Re: Stripping out Unicode combining characters (diacritics)<br/>&gt; <br/>&gt; On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <br/>&gt; &lt;doran@uta.edu&gt; wrote:<br/>&gt; [snip]<br/>&gt; &gt;<br/>&gt; &gt; I&#39;m pulling my hair out on this... so any help would be <br/>&gt; appreciated. If there&#39;s any other info I can provide, let me know.<br/>&gt; &gt;<br/>&gt; <br/>&gt; You&#39;ll want to transform the text to NFD format (nominally, <br/>&gt; base characters plus combining marks) instead of NFC (precombined<br/>&gt; characters) using Unicode::Normalize:<br/>&gt; <br/>&gt; use Unicode::Normalize;<br/>&gt; <br/>&gt; my $text = NFD($original);<br/>&gt; $text =~ s/\pM+//go;<br/>&gt; <br/>&gt; Hope that helps.<br/>&gt; <br/>&gt; --<br/>&gt; Mike Rylander<br/>&gt; | VP, Research and Design<br/>&gt; | Equinox Software, Inc. / The Evergreen Experts | phone: <br/>&gt; 1-877-OPEN-ILS (673-6457) | email: miker@esilibrary.com | <br/>&gt; web: http://www.esilibrary.com<br/>&gt; <br/>&gt; <br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg210.html Tue, 06 May 2008 07:26:50 +0000 Re: Stripping out Unicode combining characters (diacritics) by Leif Andersson I&#39;ve been doing it like Mike R suggested for quite some while.<br/>But some characters do not map nicely into this scheme.<br/><br/>So you may want to manually take care of stuff like german eszet, ligature oe etc, etc.<br/><br/>s/\x{00df}/ss/g;<br/>s/\x{0152}/Oe/g;<br/>s/\x{0153}/oe/g;<br/>...to be continued...<br/><br/>Leif<br/>======================================<br/>Leif Andersson, Systems Librarian<br/>Stockholm University Library<br/>SE-106 91 Stockholm<br/>SWEDEN<br/>Phone : +46 8 162769<br/>Mobile: +46 70 6904281<br/><br/>-----Ursprungligt meddelande-----<br/>Fr&aring;n: Doran, Michael D [mailto:doran@uta.edu] <br/>Skickat: den 6 maj 2008 04:13<br/>Till: Mike Rylander<br/>Kopia: perl-i18n@perl.org; Perl4lib<br/>&Auml;mne: RE: Stripping out Unicode combining characters (diacritics)<br/><br/>Hi Mike,<br/><br/>I appreciate the quick reply. I am familiar with the Unicode::Normalize module (and will also be using that), but I left it out of this question because it&#39;s not relevant to the problem I&#39;m currently trying to solve. The text I&#39;m trying to strip diacritics out of does not have precomposed accented characters.<br/><br/>-- Michael<br/><br/># Michael Doran, Systems Librarian<br/># University of Texas at Arlington<br/># 817-272-5326 office<br/># 817-688-1926 cell<br/># doran@uta.edu<br/># http://rocky.uta.edu/doran/<br/><br/><br/><br/>-----Original Message-----<br/>From: Mike Rylander [mailto:mrylander@gmail.com]<br/>Sent: Mon 5/5/2008 8:52 PM<br/>To: Doran, Michael D<br/>Cc: perl-i18n@perl.org; Perl4lib<br/>Subject: Re: Stripping out Unicode combining characters (diacritics)<br/> <br/>On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D &lt;doran@uta.edu&gt; wrote:<br/>[snip]<br/>&gt;<br/>&gt; I&#39;m pulling my hair out on this... so any help would be appreciated. If there&#39;s any other info I can provide, let me know.<br/>&gt;<br/><br/>You&#39;ll want to transform the text to NFD format (nominally, base<br/>characters plus combining marks) instead of NFC (precombined<br/>characters) using Unicode::Normalize:<br/><br/> use Unicode::Normalize;<br/><br/> my $text = NFD($original);<br/> $text =~ s/\pM+//go;<br/><br/>Hope that helps.<br/><br/>-- <br/>Mike Rylander<br/> | VP, Research and Design<br/> | Equinox Software, Inc. / The Evergreen Experts<br/> | phone: 1-877-OPEN-ILS (673-6457)<br/> | email: miker@esilibrary.com<br/> | web: http://www.esilibrary.com<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg209.html Tue, 06 May 2008 05:57:49 +0000 Re: Stripping out Unicode combining characters (diacritics) by Mike Rylander On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D &lt;doran@uta.edu&gt; wrote:<br/>[snip]<br/>&gt;<br/>&gt; I&#39;m pulling my hair out on this... so any help would be appreciated. If there&#39;s any other info I can provide, let me know.<br/>&gt;<br/><br/>You&#39;ll want to transform the text to NFD format (nominally, base<br/>characters plus combining marks) instead of NFC (precombined<br/>characters) using Unicode::Normalize:<br/><br/> use Unicode::Normalize;<br/><br/> my $text = NFD($original);<br/> $text =~ s/\pM+//go;<br/><br/>Hope that helps.<br/><br/>-- <br/>Mike Rylander<br/> | VP, Research and Design<br/> | Equinox Software, Inc. / The Evergreen Experts<br/> | phone: 1-877-OPEN-ILS (673-6457)<br/> | email: miker@esilibrary.com<br/> | web: http://www.esilibrary.com<br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg208.html Tue, 06 May 2008 04:56:03 +0000 RE: Stripping out Unicode combining characters (diacritics) by Doran, Michael D Hi Mike,<br/><br/>I appreciate the quick reply. I am familiar with the Unicode::Normalize module (and will also be using that), but I left it out of this question because it&#39;s not relevant to the problem I&#39;m currently trying to solve. The text I&#39;m trying to strip diacritics out of does not have precomposed accented characters.<br/><br/>-- Michael<br/><br/># Michael Doran, Systems Librarian<br/># University of Texas at Arlington<br/># 817-272-5326 office<br/># 817-688-1926 cell<br/># doran@uta.edu<br/># http://rocky.uta.edu/doran/<br/><br/><br/><br/>-----Original Message-----<br/>From: Mike Rylander [mailto:mrylander@gmail.com]<br/>Sent: Mon 5/5/2008 8:52 PM<br/>To: Doran, Michael D<br/>Cc: perl-i18n@perl.org; Perl4lib<br/>Subject: Re: Stripping out Unicode combining characters (diacritics)<br/> <br/>On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D &lt;doran@uta.edu&gt; wrote:<br/>[snip]<br/>&gt;<br/>&gt; I&#39;m pulling my hair out on this... so any help would be appreciated. If there&#39;s any other info I can provide, let me know.<br/>&gt;<br/><br/>You&#39;ll want to transform the text to NFD format (nominally, base<br/>characters plus combining marks) instead of NFC (precombined<br/>characters) using Unicode::Normalize:<br/><br/> use Unicode::Normalize;<br/><br/> my $text = NFD($original);<br/> $text =~ s/\pM+//go;<br/><br/>Hope that helps.<br/><br/>-- <br/>Mike Rylander<br/> | VP, Research and Design<br/> | Equinox Software, Inc. / The Evergreen Experts<br/> | phone: 1-877-OPEN-ILS (673-6457)<br/> | email: miker@esilibrary.com<br/> | web: http://www.esilibrary.com<br/><br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg207.html Mon, 05 May 2008 19:12:53 +0000 Stripping out Unicode combining characters (diacritics) by Doran, Michael D I&#39;m trying to strip out combining diacritics from some form input using this code: <br/> <br/>&lt;head&gt; <br/> &lt;META http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-8&quot;&gt; <br/>&lt;/head&gt; <br/>&lt;body&gt; <br/> &lt;form action=&quot;test.cgi&quot; accept-charset=&quot;UTF-8&quot; method=&quot;get&quot;&gt; <br/> &lt;input type=&quot;text&quot; name=&quot;text&quot; value=&quot;&quot; size=&quot;10&quot;&gt; <br/> &lt;input type=&quot;submit&quot; value=&quot;submit&quot;&gt; <br/> &lt;/form&gt; <br/>&lt;/body&gt; <br/>&lt;/html&gt; <br/> <br/>#!/usr/local/bin/perl <br/>use CGI; <br/>$query = CGI::new(); <br/>$search_term = $query-&gt;param(&#39;text&#39;); <br/>$sans_diacritics = $search_term; <br/>$sans_diacritics =~ s/\p{M}*//g; <br/>#$sans_diacritics =~ s/o//g; <br/>print qq(Content-type: text/plain; charset=utf-8 <br/> <br/>$sans_diacritics <br/>); <br/>exit(0); <br/> <br/> <br/>In the form, I&#39;m inputting the string &quot;Barto&Igrave;&#129;k&quot; with the accented character being a base character (small Latin letter &quot;o&quot;) followed by a combining acute accent. However, when I print (to the web) $sans_diacritics, I get my input with no change -- the combining diacritic is still there. I know that my input is not a precomposed accented character, because I can strip out the base &quot;o&quot; and the combining accent either stands alone or jumps to another character [2]. <br/> <br/>The &quot;\p{M}&quot; is a Unicode class name for the character class of Unicode &#39;marks&#39;, for example accent marks [1]. I&#39;ve tried these variations (and many others) and none seem to be doing what I want: <br/> <br/> $sans_diacritics =~ s#[\p{Mark}]*##g; <br/> $sans_diacritics =~ tr#[\p{InCombiningDiacriticalMarks}]##; <br/> $sans_diacritics =~ tr#[\p{M}]##; <br/> $sans_diacritics =~ s/\p{M}*//g; <br/> $sans_diacritics =~ s#[\p{M}]##g; <br/> $sans_diacritics =~ s#\x{0301}##g; <br/> $sans_diacritics =~ s#\x{006F}\x{0301}##g; <br/> $sans_diacritics =~ s#[\x{0300}-\x{036F}]*##g; <br/> <br/>I&#39;m pulling my hair out on this... so any help would be appreciated. If there&#39;s any other info I can provide, let me know. <br/> <br/>My Perl version is 5.8.8 and the script is running on a server running Solaris 9. <br/> <br/>-- Michael <br/> <br/>[1] per http://perldoc.perl.org/perlretut.html and other documentation <br/> <br/>[2] using $sans_diacritics =~ s/o//g; <br/> <br/># Michael Doran, Systems Librarian <br/># University of Texas at Arlington <br/># 817-272-5326 office <br/># 817-688-1926 mobile <br/># doran@uta.edu <br/># http://rocky.uta.edu/doran/ <br/> http://www.nntp.perl.org/group/perl.i18n/2008/05/msg206.html Mon, 05 May 2008 17:27:08 +0000 Locale::Maketext::Gettext 1.24 Released by imacat Dear whoever is using Locale::Maketext::Gettext,<br/><br/> Locale::Maketext::Gettext 1.24 is released, after another year. <br/>This release adds support for GNU gettext pgettext() as pmaketext(), to<br/>translate messages in a particular context.<br/><br/> Thanks to Chris Travers &lt;chris.travers@gmail.com&gt; for the suggestion. <br/>I was away from the GNU gettext development for too long.<br/><br/> GNU gettext pgettext() (and hence pmaketext()) is an important<br/>improvement for GUI applications, which may have short menu items that<br/>are ambiguous to translators without its menu context. If you do not<br/>know what is &quot;context&quot;, refer to:<br/><br/>11.2.5 Using contexts for solving ambiguities<br/>http://www.gnu.org/software/gettext/manual/gettext.html#Contexts<br/><br/> Effectively, failure_handler_auto() differs from its<br/>Locale::Maketext parent method from now on.<br/><br/> Please tell me if there is any problem. Your feedbacks are welcome. <br/>Thank you.<br/><br/>--<br/>Best regards,<br/>imacat ^_*&#39; &lt;imacat@mail.imacat.idv.tw&gt;<br/>PGP Key: http://www.imacat.idv.tw/me/pgpkey.asc<br/><br/>&lt;&lt;Woman&#39;s Voice&gt;&gt; News: http://www.wov.idv.tw/<br/>Tavern IMACAT&#39;s: http://www.imacat.idv.tw/<br/>TLUG List Manager: http://lists.linux.org.tw/cgi-bin/mailman/listinfo/tlug<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/02/msg205.html Mon, 25 Feb 2008 11:40:38 +0000 Re: Problem processing UTF-8 strings from email by Neil Gunton Thurn, Martin wrote:<br/>&gt; I believe that format is RFC 2047 Mime Header Encoding, try the<br/>&gt; Encode::MIME::Header module to decode it <br/>&gt; http://perldoc.perl.org/Encode/MIME/Header.html <br/><br/>Yes, that does the trick. Thanks very much!<br/><br/>/Neil<br/> http://www.nntp.perl.org/group/perl.i18n/2008/01/msg204.html Mon, 14 Jan 2008 06:08:00 +0000 RE: Problem processing UTF-8 strings from email by Thurn, Martin I believe that format is RFC 2047 Mime Header Encoding, try the<br/>Encode::MIME::Header module to decode it <br/>http://perldoc.perl.org/Encode/MIME/Header.html <br/><br/><br/> - - Martin <br/><br/>-----Original Message-----<br/>From: Neil Gunton [mailto:neil@nilspace.com] <br/>Sent: Saturday, January 12, 2008 19:18<br/>To: perl-i18n@perl.org<br/>Subject: Problem processing UTF-8 strings from email<br/><br/>Hi all,<br/><br/>I am somewhat experienced with Perl in general, but absolutely no <br/>experience dealing with UTF-8. I have a community journals website which<br/><br/>allows updates from users via email. I&#39;m having trouble with emails that<br/><br/>contain Chinese characters encoded (I think) as UTF-8. The strings look <br/>like this:<br/><br/>=?UTF-8?B?5qGQ5LmhLCBUb25neGlhbmc6IEJlaW5nIGEgJ2hhbg==?=<br/>=?UTF-8?B?dHUn?=<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2008/01/msg203.html Mon, 14 Jan 2008 05:24:32 +0000 Problem processing UTF-8 strings from email by Neil Gunton Hi all,<br/><br/>I am somewhat experienced with Perl in general, but absolutely no <br/>experience dealing with UTF-8. I have a community journals website which <br/>allows updates from users via email. I&#39;m having trouble with emails that <br/>contain Chinese characters encoded (I think) as UTF-8. The strings look <br/>like this:<br/><br/>=?UTF-8?B?5qGQ5LmhLCBUb25neGlhbmc6IEJlaW5nIGEgJ2hhbg==?= =?UTF-8?B?dHUn?=<br/><br/>When I read this text from a file, using my perl script, and then save <br/>it into MySQL, it comes out on the website looking literally like the <br/>above. I can&#39;t seem to get perl to &quot;do&quot; anything with it in terms of <br/>conversions to a format that looks like chinese characters when <br/>displayed on the Web.<br/><br/>Does anybody have any clues as to how to convert strings like this into <br/>something more usable - e.g. HTML character entities?<br/><br/>I&#39;m using stock perl 5.8.8 from Debian Etch.<br/><br/>Thanks!<br/><br/>/Neil<br/> http://www.nntp.perl.org/group/perl.i18n/2008/01/msg202.html Sat, 12 Jan 2008 16:17:42 +0000 Re: [perl-i18n] Perl locale information sources for server apps,and the CLDR by Guido Flohr John ORourke wrote:<br/>&gt; I suppose the *nix vendors keep their POSIX locale data up to date but I <br/>&gt; can&#39;t seem to find any info on how that process happens - who collects <br/>&gt; locale data, etc.<br/><br/>For the GNU libc locale definitions see <br/>http://mail.nl.linux.org/linux-utf8/2005-02/msg00033.html<br/><br/>Regards,<br/>Guido<br/>-- <br/>Imperia AG, Development<br/>Leyboldstr. 10 - D-50354 H&uuml;rth - http://www.imperia.net/<br/> http://www.nntp.perl.org/group/perl.i18n/2007/07/msg201.html Mon, 02 Jul 2007 23:59:14 +0000 Re: [perl-i18n] Perl locale information sources for server apps,and the CLDR by John ORourke Guido Flohr wrote:<br/>&gt; The most important point of POSIX::setlocale() is that it changes <br/>&gt; behaviors of existing functions. Error messages ($!) are <br/>&gt; automatically localized, provided that the system supports the <br/>&gt; selected locale. When you output floating point numbers, the correct <br/>&gt; floating point format will be chosen.<br/><br/>This is ok, but in a server application the user is not going to see an <br/>Operating System message, and in my case I am exclusively using Unicode, <br/>so the default Unicode collation works fine a lot of the time.<br/><br/>&gt; 3) If you need more locale specific data, simply integrate them in <br/>&gt; your applications message catalogs for the specific locale, see <br/>&gt; Locale::Maketex or Locale::TextDomain for details.<br/><br/>This is exactly the problem I wish to address - as far as I can tell <br/>there is no consistent, complete, well maintained source of locale data <br/>currently available - Locale::TextDomain/Maketex and the other <br/>message-catalogue-style modules are all for storing locale-specific <br/>messages, which does not do anything for this problem.<br/><br/>I suppose the *nix vendors keep their POSIX locale data up to date but I <br/>can&#39;t seem to find any info on how that process happens - who collects <br/>locale data, etc.<br/><br/>cheers<br/>John<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2007/07/msg200.html Mon, 02 Jul 2007 18:03:09 +0000 Re: [perl-i18n] Perl locale information sources for server apps,and the CLDR by Guido Flohr Hi,<br/><br/>John ORourke wrote:<br/>&gt; I looked into the following:<br/>&gt; <br/>&gt; - POSIX - great idea, but info seems incomplete (eg. quote style, <br/>&gt; units of weight), and updates depend on your *nix distro. Also dates <br/>&gt; back to desktop days - the locale is set for a whole process, not ideal <br/>&gt; in a web server environment. Speed of locale switching seems OK though.<br/><br/>The most important point of POSIX::setlocale() is that it changes <br/>behaviors of existing functions. Error messages ($!) are automatically <br/>localized, provided that the system supports the selected locale. When <br/>you output floating point numbers, the correct floating point format <br/>will be chosen.<br/><br/>Your CLDR idea is maybe not the worst, but then actually Perl should be <br/>changed internally to use CLDR data, and ignore the OS hints. However, <br/>for some topics like error messages this is not really possible.<br/><br/>Usually, the following approach works quite good for web applications or <br/>server applications in general:<br/><br/>1) On startup try to guess a locale setting and change the locale using <br/>POSIX::setlocale().<br/><br/>2) Let Perl (resp. the underlying libc) do the work for the categories <br/>where this is already possible (OS error messages, decimal number <br/>formats, collating, and so on).<br/><br/>3) If you need more locale specific data, simply integrate them in your <br/>applications message catalogs for the specific locale, see <br/>Locale::Maketex or Locale::TextDomain for details.<br/><br/>If 3) doesn&#39;t seem clean to you: Generate the message catalogs, resp. <br/>the relevant part of the files from CLDR data.<br/><br/>Cheers,<br/>Guido<br/>-- <br/>Imperia AG, Development<br/>Leyboldstr. 10 - D-50354 H&uuml;rth - http://www.imperia.net/<br/> http://www.nntp.perl.org/group/perl.i18n/2007/07/msg199.html Mon, 02 Jul 2007 08:53:20 +0000 [perl-i18n] Perl locale information sources for server apps, andthe CLDR by John ORourke Hi folks, sorry this is a bit long!<br/><br/>[ summary: just got into i18n and unicode, not happy with sources of <br/>locale information and wondering about use of CLDR, considering writing <br/>a Perl module for it ]<br/><br/>I&#39;ve just successfully ported my large web application <br/>(yet-another-template-system) to support end-to-end unicode and full i18n.<br/><br/>However I can&#39;t bring myself to use any of the existing <br/>locale-information modules - nothing feels &#39;comfortable&#39; from a <br/>commercial and technical point of view. The basic issue is that we need <br/>the application itself to run in a &#39;neutral&#39; locale - ie. running perl <br/>without &#39;use locale&#39;, but each web request needs access to locale <br/>information specific to the request.<br/><br/>What modules are people using out there and how happy are you?<br/><br/>The things I look at are:<br/> - update frequency (world events like expansion of the EU need to be <br/>implemented soon after the event)<br/> - reliability (most CPAN modules are &quot;use at your own risk&quot; of course)<br/> - consistency (some differences between each locale system)<br/> - localised names (country and currency names in native languages)<br/><br/>I looked into the following:<br/><br/> - POSIX - great idea, but info seems incomplete (eg. quote style, <br/>units of weight), and updates depend on your *nix distro. Also dates <br/>back to desktop days - the locale is set for a whole process, not ideal <br/>in a web server environment. Speed of locale switching seems OK though.<br/><br/> - Locale::Object - nice interface, uses a SQLite data file but looks <br/>like maintenance is &quot;as and when&quot;, and the database schema is <br/>undocumented. The &#39;make_sane&#39; idea is great - given a country, it <br/>finds a suitable language and currency if current values are invalid.<br/><br/> - Locale::Constants - again update method is unknown but nice, <br/>simple interface<br/><br/> - DateTime::Locale - looks like Dave Rolsky had the same problems <br/>and wrote his own locale files as separate modules, so DateTime has its <br/>own Locale and Timezone data, which may or may not match your system <br/>locales. Even so this is a nice module, especially the Storable hooks.<br/><br/>I looked around for sources of this data, and found the Unicode CLDR, <br/>but I was surprised to see no CPAN modules for it! The files have just <br/>about all data you could want, and the format is properly defined XML, <br/>and updates are organised, regular, and vetted. See <br/>http://unicode.org/cldr/process.html<br/><br/>I am thinking about creating the following:<br/> 1 - a Perl module which allows CLDR data files to be queried in many <br/>ways<br/> 2 - a script to update the system&#39;s POSIX locale data from the CLDR <br/>so all your other apps can benefit.<br/> 3 - a web service with free and commercial options, which provides <br/>the latest CLDR data - the Perl module could be told to use this or <br/>local files<br/><br/>Any comments before I set off down this road? Anybody want to contribute?<br/><br/>cheers<br/>John O&#39;Rourke<br/>Versatilia Ltd<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2007/07/msg198.html Mon, 02 Jul 2007 08:26:52 +0000 Re: Problems with Perl Asian encodings? by Ciaran Hamilton Hi,<br/><br/>Samuel L. Bayer wrote:<br/>&gt; So the outcome was that there&#39;s a mode in GNU recode which will drop <br/>&gt; these illegal first bytes. So the question is: is the same thing <br/>&gt; possible in Perl Encode? The documentation for some of the FB_ variables <br/>&gt; is tempting, but pretty opaque.<br/><br/>Yes, the way to do it is by using Encode::FB_QUIET. Basically, here&#39;s <br/>how you would do it... if $text is the text you want to decode into <br/>UTF-8, then this should do the trick:<br/><br/>-----<br/>use Encode;<br/><br/>my $textcopy = $text;<br/>my $encoding = &quot;gb2312&quot;;<br/><br/>my $decoded = decode($encoding, $text, Encode::FB_QUIET);<br/><br/>while ($text ne &quot;&quot;) { # this loops while we&#39;ve still got bad <br/>characters to deal with.<br/> ### my $badbyte = substr($text, 0, 1); # $badbyte now contains the <br/>invalid byte.<br/> ### my $hex = sprintf(&quot;%X&quot;, ord($badbyte));<br/> ### print STDERR &quot;Invalid character \\x&quot; . (&quot;0&quot; x (1 - length($hex))) <br/>. $hex . &quot; in input - dropping.\n&quot;;<br/> $text = substr($text, 1); # skip over the bad character<br/> $decoded .= decode($encoding, $text, Encode::FB_QUIET);<br/>}<br/><br/>print &quot;Output: $decoded\n&quot;;<br/>-----<br/><br/>The code as given will ignore every bad character and prints no <br/>warnings; if you want warnings, uncomment the lines marked with ###. It <br/>depends what you want your code to do. :D<br/><br/>Hope this helps!<br/><br/> - Ciaran.<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg197.html Thu, 17 May 2007 02:29:49 +0000 Re: Problems with Perl Asian encodings? by Samuel L. Bayer Samuel L. Bayer wrote:<br/><br/>&gt; Has anyone else done such a comparison of GNU recode and Perl Encode? <br/>&gt; I&#39;d very much prefer to move the Perl, not simply for efficiency but <br/>&gt; because, unlike GNU recode, it appears to be actively maintained; <br/>&gt; however, the error rate is just too high, especially considering that <br/>&gt; the GNU recode output looks clean, and our users have not complained <br/>&gt; about it.<br/><br/>Hi again all -<br/><br/>Last week, I sent out a query about Asian encodings and Perl Encode vs. <br/>GNU recode. Martin Thurn graciously helped me debug this problem, and I <br/>can now summarize as follows, quoting Martin:<br/><br/>&quot; In the sample data you sent, in the original GB2312, right after the<br/>word &quot;diode&quot;, there is an octal \244 and octal \112. Octal \244 =<br/>decimal 164 which is not a legal first-byte in GB2312.<br/> Recode apparently dropped the \244 and left the \112 as-is, a capital<br/>J.<br/> Encode apparently converted the \244 to a default UTF-8 &quot;unknown<br/>character&quot; and left the \112 as-is, a capital J.&quot;<br/><br/>So the outcome was that there&#39;s a mode in GNU recode which will drop <br/>these illegal first bytes. So the question is: is the same thing <br/>possible in Perl Encode? The documentation for some of the FB_ variables <br/>is tempting, but pretty opaque.<br/><br/>Again, I&#39;m using Perl 5.8.7, with the versions of Encode that come with <br/>that distribution.<br/><br/>Thanks so much in advance -<br/><br/>Sam Bayer<br/>The MITRE Corporation<br/>sam@mitre.org<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg196.html Mon, 14 May 2007 09:16:13 +0000 Re: Character Encoding (UTF-8) in PERL by Damyan Ivanov -=| Oliver K&ouml;nig, Sun, 13 May 2007 17:52:54 +0200 |=-<br/>&gt; On Sunday 13 May 2007 17:14:43 you wrote:<br/>&gt; &gt; Krzysztof Krzy??aniak dijo [Thu, May 10, 2007 at 09:48:46AM +0200]:<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; or (from man DBD::mysql):<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; mysql_enable_utf8<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; This attribute determines whether DBD::mysql should assume strings<br/>&gt; &gt; &gt; stored in the database are utf8. This feature defaults to off.<br/>&gt; &gt; &gt;<br/>&gt; &gt; &gt; When set, a data retrieved from a textual column type (char,<br/>&gt; &gt; &gt; varchar, etc) will have the UTF-8 flag turned on if necessary.<br/><br/>&gt; However the command dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;);<br/>&gt; excuted in a PERL script returns:<br/>&gt; character_set_client &nbsp; &nbsp; &nbsp;latin1<br/>&gt; character_set_connection &nbsp;latin1<br/>&gt; character_set_database &nbsp; &nbsp;utf8<br/>&gt; character_set_filesystem &nbsp;binary<br/>&gt; character_set_results &nbsp; &nbsp; latin1<br/>&gt; character_set_server &nbsp; &nbsp; &nbsp;utf8<br/>&gt; character_set_system &nbsp; &nbsp; &nbsp;utf8<br/>&gt; character_sets_dir &nbsp; &nbsp; &nbsp; &nbsp;/usr/share/mysql/charsets/<br/>&gt; <br/>&gt; Everybody is blaming mysql but it is obvious to me that PERL is the<br/>&gt; problem. How do I configure PERL tu use UTF-8 as default???<br/><br/>Perhaps changing the above default in DBD::Mysql would do the<br/>trick?<br/><br/>-- <br/>dam JabberID: dam@jabber.minus273.org<br/><br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg195.html Sun, 13 May 2007 20:08:56 +0000 Re: Character Encoding (UTF-8) in PERL by Gunnar Wolf Oliver K&ouml;nig dijo [Sun, May 13, 2007 at 05:52:54PM +0200]:<br/>&gt; &gt; I think that, as Etch is by default installed with UTF on everywhere,<br/>&gt; &gt; we should propose changing this default, so that MySQL connections are<br/>&gt; &gt; also UTF by default. This could enter Etch 4.0r1 (or further). What do<br/>&gt; &gt; you think?<br/>&gt; &gt;<br/>&gt; My mysql settings are fine. The problem is PERL. <br/>&gt; <br/>&gt; mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>&gt; +--------------------------+----------------------------+<br/>&gt; | Variable_name &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| Value &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br/>&gt; +--------------------------+----------------------------+<br/>&gt; | character_set_client &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_connection | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_database &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_filesystem | binary &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_results &nbsp; &nbsp;| utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_server &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_set_system &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>&gt; | character_sets_dir &nbsp; &nbsp; &nbsp; | /usr/share/mysql/charsets/ |<br/>&gt; +--------------------------+----------------------------+<br/>&gt; <br/>&gt; However the command dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;); excuted <br/>&gt; in a PERL script returns:<br/>&gt; character_set_client &nbsp; &nbsp; &nbsp;latin1<br/>&gt; character_set_connection &nbsp;latin1<br/>&gt; character_set_database &nbsp; &nbsp;utf8<br/>&gt; character_set_filesystem &nbsp;binary<br/>&gt; character_set_results &nbsp; &nbsp; latin1<br/>&gt; character_set_server &nbsp; &nbsp; &nbsp;utf8<br/>&gt; character_set_system &nbsp; &nbsp; &nbsp;utf8<br/>&gt; character_sets_dir &nbsp; &nbsp; &nbsp; &nbsp;/usr/share/mysql/charsets/<br/>&gt; <br/>&gt; Everybody is blaming mysql but it is obvious to me that PERL is the problem. <br/>&gt; How do I configure PERL tu use UTF-8 as default???<br/><br/>Exactly - The problem is not Perl the language, but the connection<br/>settings expressed by the glue between Perl and MySQL, the DBD::mysql<br/>module - Just as you configure your MySQL defaults in<br/>/etc/mysql/my.cnf and ~/.my.cnf (and use some default values),<br/>DBD::mysql is configured partially (I guess) at build time, and part<br/>can be specified at invocation time. For example, under the &#39;connect&#39;<br/>statement from the DBD::mysql manpage, I found:<br/><br/> $dsn = &quot;DBI:mysql:test;mysql_read_default_file=/home/joe/my.cnf&quot;;<br/><br/>Later on, in the &#39;DATABASE HANDLES&#39; section, I found:<br/><br/> mysql_enable_utf8<br/> This attribute determines whether DBD::mysql should assume<br/> strings stored in the database are utf8. This feature<br/> defaults to off.<br/><br/>And what I&#39;m suggesting is to change this default, if there is no<br/>reason not to do so.<br/><br/>Now, seeing this is the documented behaviour, I do not think anymore<br/>this should be changed for Etch (users might be relying on the<br/>opposite behaviour), but for Lenny.<br/><br/>Greetings,<br/><br/>-- <br/>Gunnar Wolf - gwolf@gwolf.org - (+52-55)5623-0154 / 1451-2244<br/>PGP key 1024D/8BB527AF 2001-10-23<br/>Fingerprint: 0C79 D2D1 2C4E 9CE4 5973 F800 D80E F35A 8BB5 27AF<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg194.html Sun, 13 May 2007 18:28:59 +0000 Re: Character Encoding (UTF-8) in PERL by Oliver König On Sunday 13 May 2007 17:14:43 you wrote:<br/>&gt; Krzysztof Krzy??aniak dijo [Thu, May 10, 2007 at 09:48:46AM +0200]:<br/>&gt; &gt; after connection do query &quot;set NAMES utf8&quot;<br/>&gt; &gt; http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html<br/>&gt; &gt;<br/>&gt; &gt; or (from man DBD::mysql):<br/>&gt; &gt;<br/>&gt; &gt; mysql_enable_utf8<br/>&gt; &gt;<br/>&gt; &gt; This attribute determines whether DBD::mysql should assume strings<br/>&gt; &gt; stored in the database are utf8. This feature defaults to off.<br/>&gt; &gt;<br/>&gt; &gt; When set, a data retrieved from a textual column type (char, varchar,<br/>&gt; &gt; etc) will have the UTF-8 flag turned on if necessary. This enables<br/>&gt; &gt; character semantics on that string. You will also need to ensure that<br/>&gt; &gt; your database / table / column is configured to use UTF8. See Chapter<br/>&gt; &gt; 10 of the mysql manual for details.<br/>&gt; &gt;<br/>&gt; &gt; Additionally, turning on this flag tells MySQL that incoming data should<br/>&gt; &gt; be treated as UTF-8. This will only take effect if used as part of the<br/>&gt; &gt; call to connect(). If you turn the flag on after connecting, you will<br/>&gt; &gt; need to issue the command &quot;SET NAMES utf8&quot; to get the same effect.<br/>&gt; &gt;<br/>&gt; &gt; This option is experimental and may change in future versions.<br/>&gt;<br/>&gt; Ummm... This last line makes me not too confident about what I&#39;m going<br/>&gt; to propose, but anyway...<br/>&gt;<br/>&gt; I think that, as Etch is by default installed with UTF on everywhere,<br/>&gt; we should propose changing this default, so that MySQL connections are<br/>&gt; also UTF by default. This could enter Etch 4.0r1 (or further). What do<br/>&gt; you think?<br/>&gt;<br/>My mysql settings are fine. The problem is PERL. <br/><br/>mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>+--------------------------+----------------------------+<br/>| Variable_name &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;| Value &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|<br/>+--------------------------+----------------------------+<br/>| character_set_client &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_connection | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_database &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_filesystem | binary &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_results &nbsp; &nbsp;| utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_server &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_set_system &nbsp; &nbsp; | utf8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br/>| character_sets_dir &nbsp; &nbsp; &nbsp; | /usr/share/mysql/charsets/ |<br/>+--------------------------+----------------------------+<br/><br/>However the command dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;); excuted <br/>in a PERL script returns:<br/>character_set_client &nbsp; &nbsp; &nbsp;latin1<br/>character_set_connection &nbsp;latin1<br/>character_set_database &nbsp; &nbsp;utf8<br/>character_set_filesystem &nbsp;binary<br/>character_set_results &nbsp; &nbsp; latin1<br/>character_set_server &nbsp; &nbsp; &nbsp;utf8<br/>character_set_system &nbsp; &nbsp; &nbsp;utf8<br/>character_sets_dir &nbsp; &nbsp; &nbsp; &nbsp;/usr/share/mysql/charsets/<br/><br/>Everybody is blaming mysql but it is obvious to me that PERL is the problem. <br/>How do I configure PERL tu use UTF-8 as default???<br/><br/>-- <br/>Oliver K&ouml;nig<br/><br/>Windfinder.com<br/>Knorrstr. 24 Hinterhaus<br/>24106 Kiel<br/>Germany<br/>phone +49 431-8008643<br/>VoIP +49 431-5569222<br/>fax +49 431-8008644<br/>Mobile +49 177-4933362<br/>oliver@windfinder.com<br/>www.windfinder.com<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg193.html Sun, 13 May 2007 08:54:17 +0000 Problems with Perl Asian encodings? by Samuel L. Bayer All -<br/><br/>I&#39;m having a problem that perhaps someone here can cast some light on. <br/>For a very long time, a project I work on has been using GNU recode 3.6 <br/>to transcode a wide range of encodings into UTF-8, including some of the <br/>more common Korean, Japanese and Chinese encodings (e.g., SJIS, gb2312, <br/>EUC-KR). For efficiency reasons, we&#39;ve been looking at moving to the <br/>Perl Encode module, which we already use for windows-1252 because of <br/>corruption issues with Arabic with GNU recode.<br/><br/>I compared a batch of about 120,000 documents for which I had both the <br/>original and the output of GNU recode, and discovered a (relatively <br/>small) number of differences, say about 4K documents. In almost all of <br/>those cases, Perl recode appears to be inferior. The vast majority of <br/>the differing documents are gb2132; however, many of the other Asian <br/>encodings have sporadic problems. When I examine the documents for <br/>differences, I typically find that Perl recode has introduced some stray <br/>&quot;unknown&quot; characters at various points in the document, while the GNU <br/>recode version is clean.<br/><br/>Has anyone else done such a comparison of GNU recode and Perl Encode? <br/>I&#39;d very much prefer to move the Perl, not simply for efficiency but <br/>because, unlike GNU recode, it appears to be actively maintained; <br/>however, the error rate is just too high, especially considering that <br/>the GNU recode output looks clean, and our users have not complained <br/>about it.<br/><br/>Any comments or advice would be welcome. I&#39;m using Perl 5.8.7 (I known, <br/>it&#39;s not the latest version, but it&#39;s part of a very stable <br/>configuration that the project doesn&#39;t want to vary).<br/><br/>Thanks in advance -<br/>Sam Bayer<br/>The MITRE Corporation<br/>sam@mitre.org<br/><br/>P.S. My familiarity with encoding issues is not extensive, and one thing <br/>that occurred to me was that there may be an encoding name conflict <br/>between GNU recode and Perl recode which was leading to the differences <br/>I was seeing. However, in the first two cases I examined, no encoding <br/>known to Perl Encode for the given languages (Chinese and Japanese) <br/>yielded the same (clean) output as GNU recode.<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg192.html Thu, 10 May 2007 12:31:37 +0000 Re: Character Encoding (UTF-8) in PERL by James Kiser Solving it on the database side would be the optimal solution. If it<br/>isn&#39;t a possibility, for whatever reason, take a look at<br/>http://perldoc.perl.org/utf8.html.<br/><br/>Thanks,<br/>James<br/><br/>On 5/10/07, Christian Kuelker &lt;christian.kuelker@cipworx.org&gt; wrote:<br/>&gt; -----BEGIN PGP SIGNED MESSAGE-----<br/>&gt; Hash: SHA1<br/>&gt;<br/>&gt; Hi,<br/>&gt;<br/>&gt; Oliver K&ouml;nig schrieb:<br/>&gt; &gt; Sarge<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 4.0.24<br/>&gt; &gt;<br/>&gt; &gt; Etch<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 5.0.32<br/>&gt; &gt;<br/>&gt; &gt; In mysql everything looks fine, too:<br/>&gt; I doubt ..<br/>&gt;<br/>&gt; &gt; mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | Variable_name | Value |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | character_set_client | utf8 |<br/>&gt; &gt; | character_set_connection | utf8 |<br/>&gt; &gt; | character_set_database | utf8 |<br/>&gt; &gt; | character_set_filesystem | binary |<br/>&gt; &gt; | character_set_results | utf8 |<br/>&gt; &gt; | character_set_server | utf8 |<br/>&gt; &gt; | character_set_system | utf8 |<br/>&gt; &gt; | character_sets_dir | /usr/share/mysql/charsets/ |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt;<br/>&gt; &gt; However a PERL script with dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;);<br/>&gt; &gt; returns:<br/>&gt; &gt; character_set_client latin1<br/>&gt; &gt; character_set_connection latin1<br/>&gt; &gt; character_set_database utf8<br/>&gt; &gt; character_set_filesystem binary<br/>&gt; &gt; character_set_results latin1<br/>&gt; &gt; character_set_server utf8<br/>&gt; &gt; character_set_system utf8<br/>&gt; &gt; character_sets_dir /usr/share/mysql/charsets/<br/>&gt; &gt;<br/>&gt; &gt; How can we tell PERL to use UTF-8 as default encoding?<br/>&gt;<br/>&gt; As you show in your query, the results are in latin1. I would guess<br/>&gt; it is the upgrade of mysql which is your problem. So try to solve the<br/>&gt; problem in mysql not Perl.<br/>&gt;<br/>&gt; Cheers<br/>&gt; C.<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt; -----BEGIN PGP SIGNATURE-----<br/>&gt; Version: GnuPG v1.4.6 (GNU/Linux)<br/>&gt; Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org<br/>&gt;<br/>&gt; iD8DBQFGQvO3rKb2iXSP9HYRAtdhAKCnJHGJRe/+SbvVUyDJVBNegXg6rQCfTfn2<br/>&gt; opttpYX19epCeWfJMsGf0ZY=<br/>&gt; =LB2m<br/>&gt; -----END PGP SIGNATURE-----<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg191.html Thu, 10 May 2007 10:21:42 +0000 Re: Character Encoding (UTF-8) in PERL by James Kiser Solving it on the database side would be the optimal solution. If it<br/>isn&#39;t a possibility, for whatever reason, take a look at<br/>http://perldoc.perl.org/utf8.html.<br/><br/>Thanks,<br/>James<br/><br/>On 5/10/07, Christian Kuelker &lt;christian.kuelker@cipworx.org&gt; wrote:<br/>&gt; -----BEGIN PGP SIGNED MESSAGE-----<br/>&gt; Hash: SHA1<br/>&gt;<br/>&gt; Hi,<br/>&gt;<br/>&gt; Oliver K&ouml;nig schrieb:<br/>&gt; &gt; Sarge<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 4.0.24<br/>&gt; &gt;<br/>&gt; &gt; Etch<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 5.0.32<br/>&gt; &gt;<br/>&gt; &gt; In mysql everything looks fine, too:<br/>&gt; I doubt ..<br/>&gt;<br/>&gt; &gt; mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | Variable_name | Value |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | character_set_client | utf8 |<br/>&gt; &gt; | character_set_connection | utf8 |<br/>&gt; &gt; | character_set_database | utf8 |<br/>&gt; &gt; | character_set_filesystem | binary |<br/>&gt; &gt; | character_set_results | utf8 |<br/>&gt; &gt; | character_set_server | utf8 |<br/>&gt; &gt; | character_set_system | utf8 |<br/>&gt; &gt; | character_sets_dir | /usr/share/mysql/charsets/ |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt;<br/>&gt; &gt; However a PERL script with dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;);<br/>&gt; &gt; returns:<br/>&gt; &gt; character_set_client latin1<br/>&gt; &gt; character_set_connection latin1<br/>&gt; &gt; character_set_database utf8<br/>&gt; &gt; character_set_filesystem binary<br/>&gt; &gt; character_set_results latin1<br/>&gt; &gt; character_set_server utf8<br/>&gt; &gt; character_set_system utf8<br/>&gt; &gt; character_sets_dir /usr/share/mysql/charsets/<br/>&gt; &gt;<br/>&gt; &gt; How can we tell PERL to use UTF-8 as default encoding?<br/>&gt;<br/>&gt; As you show in your query, the results are in latin1. I would guess<br/>&gt; it is the upgrade of mysql which is your problem. So try to solve the<br/>&gt; problem in mysql not Perl.<br/>&gt;<br/>&gt; Cheers<br/>&gt; C.<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt; -----BEGIN PGP SIGNATURE-----<br/>&gt; Version: GnuPG v1.4.6 (GNU/Linux)<br/>&gt; Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org<br/>&gt;<br/>&gt; iD8DBQFGQvO3rKb2iXSP9HYRAtdhAKCnJHGJRe/+SbvVUyDJVBNegXg6rQCfTfn2<br/>&gt; opttpYX19epCeWfJMsGf0ZY=<br/>&gt; =LB2m<br/>&gt; -----END PGP SIGNATURE-----<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg190.html Thu, 10 May 2007 10:05:13 +0000 Re: Character Encoding (UTF-8) in PERL by James Kiser Solving it on the database side would be the optimal solution. If it<br/>isn&#39;t a possibility, for whatever reason, take a look at<br/>http://perldoc.perl.org/utf8.html.<br/><br/>Thanks,<br/>James<br/><br/>On 5/10/07, Christian Kuelker &lt;christian.kuelker@cipworx.org&gt; wrote:<br/>&gt; -----BEGIN PGP SIGNED MESSAGE-----<br/>&gt; Hash: SHA1<br/>&gt;<br/>&gt; Hi,<br/>&gt;<br/>&gt; Oliver K&ouml;nig schrieb:<br/>&gt; &gt; Sarge<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 4.0.24<br/>&gt; &gt;<br/>&gt; &gt; Etch<br/>&gt; &gt; ====<br/>&gt; &gt; mysql 5.0.32<br/>&gt; &gt;<br/>&gt; &gt; In mysql everything looks fine, too:<br/>&gt; I doubt ..<br/>&gt;<br/>&gt; &gt; mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | Variable_name | Value |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt; | character_set_client | utf8 |<br/>&gt; &gt; | character_set_connection | utf8 |<br/>&gt; &gt; | character_set_database | utf8 |<br/>&gt; &gt; | character_set_filesystem | binary |<br/>&gt; &gt; | character_set_results | utf8 |<br/>&gt; &gt; | character_set_server | utf8 |<br/>&gt; &gt; | character_set_system | utf8 |<br/>&gt; &gt; | character_sets_dir | /usr/share/mysql/charsets/ |<br/>&gt; &gt; +--------------------------+----------------------------+<br/>&gt; &gt;<br/>&gt; &gt; However a PERL script with dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;);<br/>&gt; &gt; returns:<br/>&gt; &gt; character_set_client latin1<br/>&gt; &gt; character_set_connection latin1<br/>&gt; &gt; character_set_database utf8<br/>&gt; &gt; character_set_filesystem binary<br/>&gt; &gt; character_set_results latin1<br/>&gt; &gt; character_set_server utf8<br/>&gt; &gt; character_set_system utf8<br/>&gt; &gt; character_sets_dir /usr/share/mysql/charsets/<br/>&gt; &gt;<br/>&gt; &gt; How can we tell PERL to use UTF-8 as default encoding?<br/>&gt;<br/>&gt; As you show in your query, the results are in latin1. I would guess<br/>&gt; it is the upgrade of mysql which is your problem. So try to solve the<br/>&gt; problem in mysql not Perl.<br/>&gt;<br/>&gt; Cheers<br/>&gt; C.<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt; -----BEGIN PGP SIGNATURE-----<br/>&gt; Version: GnuPG v1.4.6 (GNU/Linux)<br/>&gt; Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org<br/>&gt;<br/>&gt; iD8DBQFGQvO3rKb2iXSP9HYRAtdhAKCnJHGJRe/+SbvVUyDJVBNegXg6rQCfTfn2<br/>&gt; opttpYX19epCeWfJMsGf0ZY=<br/>&gt; =LB2m<br/>&gt; -----END PGP SIGNATURE-----<br/>&gt;<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg189.html Thu, 10 May 2007 09:55:01 +0000 Re: Character Encoding (UTF-8) in PERL by Christian Kuelker -----BEGIN PGP SIGNED MESSAGE-----<br/>Hash: SHA1<br/><br/>Hi,<br/><br/>Oliver K&ouml;nig schrieb:<br/>&gt; Sarge<br/>&gt; ====<br/>&gt; mysql 4.0.24 <br/>&gt; <br/>&gt; Etch<br/>&gt; ====<br/>&gt; mysql 5.0.32<br/>&gt; <br/>&gt; In mysql everything looks fine, too:<br/>I doubt ..<br/><br/>&gt; mysql&gt; SHOW VARIABLES LIKE &quot;character_set_%&quot;;<br/>&gt; +--------------------------+----------------------------+<br/>&gt; | Variable_name | Value |<br/>&gt; +--------------------------+----------------------------+<br/>&gt; | character_set_client | utf8 |<br/>&gt; | character_set_connection | utf8 |<br/>&gt; | character_set_database | utf8 |<br/>&gt; | character_set_filesystem | binary |<br/>&gt; | character_set_results | utf8 |<br/>&gt; | character_set_server | utf8 |<br/>&gt; | character_set_system | utf8 |<br/>&gt; | character_sets_dir | /usr/share/mysql/charsets/ |<br/>&gt; +--------------------------+----------------------------+<br/>&gt; <br/>&gt; However a PERL script with dbh-&gt;do(SHOW VARIABLES LIKE &quot;character_set_%&quot;); <br/>&gt; returns:<br/>&gt; character_set_client latin1<br/>&gt; character_set_connection latin1<br/>&gt; character_set_database utf8<br/>&gt; character_set_filesystem binary<br/>&gt; character_set_results latin1<br/>&gt; character_set_server utf8<br/>&gt; character_set_system utf8<br/>&gt; character_sets_dir /usr/share/mysql/charsets/<br/>&gt; <br/>&gt; How can we tell PERL to use UTF-8 as default encoding?<br/><br/>As you show in your query, the results are in latin1. I would guess<br/>it is the upgrade of mysql which is your problem. So try to solve the<br/>problem in mysql not Perl.<br/><br/>Cheers<br/>C.<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>-----BEGIN PGP SIGNATURE-----<br/>Version: GnuPG v1.4.6 (GNU/Linux)<br/>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org<br/><br/>iD8DBQFGQvO3rKb2iXSP9HYRAtdhAKCnJHGJRe/+SbvVUyDJVBNegXg6rQCfTfn2<br/>opttpYX19epCeWfJMsGf0ZY=<br/>=LB2m<br/>-----END PGP SIGNATURE-----<br/> http://www.nntp.perl.org/group/perl.i18n/2007/05/msg188.html Thu, 10 May 2007 03:28:39 +0000