develooper Front page | perl.i18n | Postings from November 2011

Re: GB2312 Encoding and File Names

Thread Previous | Thread Next
From:
--[ UxBoD ]--
Date:
November 21, 2011 10:32
Subject:
Re: GB2312 Encoding and File Names
Message ID:
0254b387-ab31-4a43-98e4-32608620f06a@office.splatnix.net
Through some help of the PerlMonks board I have decoded the file name correctly; but when you dump it does not match the physical file name as it is stored within the file system ie.

MIME Header : =?gb2312?B?RFBNMjAwN2V4Y2hhbmdl64rgXcVj4F3P5NDej80uemlw?=
Decoded     : DPM2007exchange電郵與郵箱修復.zip
$VAR1 = "DPM2007exchange\x{96fb}\x{90f5}\x{8207}\x{90f5}\x{7bb1}\x{4fee}\x{5fa9}.zip";

so when one tries to compare to what is read from a directory listing you cannot match them together :( How do I get the decoded name to be as it is meant to be; as show above.
--
Thanks, Phil

----- Original Message -----
> Just a follow up for some help on this problem. I appear to be able
> to decode Simplified Chinese okay but Tradional Chinese is somewhat
> more difficult.  I have the file name MIME entity:
>
> =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=
>
> which should decode to:
>
> DPM2007exchange電郵與郵箱修復.zip
>
> but when I try and decode that name in Perl it comes out as:
>
> DPM2007exchange���]�c�]箱修��.zip
>
> I have installed the Encode::HanExtra module but even with that it is
> still not showing correctly. Am I missing some other type of module
> ?
> --
> Thanks, Phil
>
> ----- Original Message -----
> > Hello all,
> >
> > I do hope I am in the right place for some help! I am working on a
> > project that requires email attachments to be extracted to the file
> > system. All was working great until one of our kind testers tried
> > with normal and simplified Chinese; where I ended up with files of
> > the name ?????.txt.
> >
> > Am using the module MIME::Parser to extract the files and after
> > some
> > great help from the developer I have realized that one need to
> > override a method in MIME::Parser::Filer so that the correct file
> > names are generated.
> >
> > One of the attachments in the test email is show below:
> >
> > 360新闻监测-12-01-Chi Simp.txt
> >
> > I have tried to use MIME::EncWords and MIME::Charset to extract the
> > correct name from the MIME entity using:
> >
> > my $fname = decode_mimewords($head->recommended_filename);
> >
> > but this still does not work :( so I tried to compare what the file
> > name looks like with the LANG with/and without UTF8
> >
> > With LANG en_GB.UTF8
> >
> > 360新闻监测-12-01-Chi Simp.txt
> >
> > With LANG en_GB
> >
> > 360�?��?��??��?-12-01-Chi Simp.txt
> >
> > Now this is what happens when I extract the file with my new
> > method:
> >
> > With LANG en_GB
> >
> > 360���ż���-12-01-Chi Simp.txt
> >
> > With LANG en_GB.UTF8
> >
> > 360???ż???-12-01-Chi Simp.txt
> >
> > The MIME file name appears as
> > ?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=
> >
> > This is not may area of expertise so reaching out to you for some
> > help. How can one extract the file name from an email and have it
> > reflect its really Chinese name ?  Hope this make sense!
> > --
> > Thanks, Phil
> >
>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About