develooper Front page | perl.i18n | Postings from November 2011

GB2312 Encoding and File Names

Thread Next
--[ UxBoD ]--
November 17, 2011 10:20
GB2312 Encoding and File Names
Message ID:
Hello all,

I do hope I am in the right place for some help! I am working on a project that requires email attachments to be extracted to the file system. All was working great until one of our kind testers tried with normal and simplified Chinese; where I ended up with files of the name ?????.txt.

Am using the module MIME::Parser to extract the files and after some great help from the developer I have realized that one need to override a method in MIME::Parser::Filer so that the correct file names are generated.

One of the attachments in the test email is show below:

360新闻监测-12-01-Chi Simp.txt

I have tried to use MIME::EncWords and MIME::Charset to extract the correct name from the MIME entity using:

my $fname = decode_mimewords($head->recommended_filename);

but this still does not work :( so I tried to compare what the file name looks like with the LANG with/and without UTF8

With LANG en_GB.UTF8

360新闻监测-12-01-Chi Simp.txt

With LANG en_GB

360�?��?��??��?-12-01-Chi Simp.txt

Now this is what happens when I extract the file with my new method:

With LANG en_GB

360���ż���-12-01-Chi Simp.txt

With LANG en_GB.UTF8

360???ż???-12-01-Chi Simp.txt

The MIME file name appears as ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=

This is not may area of expertise so reaching out to you for some help. How can one extract the file name from an email and have it reflect its really Chinese name ?  Hope this make sense!
Thanks, Phil

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About