develooper Front page | perl.perl4lib | Postings from March 2007

Re: utf-8 code: \xAE not map

Thread Previous | Thread Next
From:
Mike Rylander
Date:
March 28, 2007 10:42
Subject:
Re: utf-8 code: \xAE not map
Message ID:
b918cf3d0703281042h45c3315ep2d6c64420ba9e7ae@mail.gmail.com
On 3/28/07, Jackie Shieh <jshieh@umich.edu> wrote:
>
> Mike,
>
> Attached is the questionable marc record.
> The error message came from the command line.
>
> % marcdump 50987256.mrc
>
> I believe we have upgraded to the most recent version
> (v.2) from CPAN. What is the current MARC::Charset module
> we should have?

The newest version on CPAN is 0.96, release just a couple weeks ago.

http://search.cpan.org/~mikery/MARC-Charset-0.96/lib/MARC/Charset.pm

>
> After sending my query, I kept coming across more
> records on the same Charset not map issue.
> Thanks for your help!
>

That record is /definitely/ UTF-8 encoded, which means there's no need
to use MARC::Charset for it.  f there is a mix of records that are
UTF-8 and MARC8 encoded you can add

MARC::Charset->ignore_errors(1);
MARC::Charset->assume_encoding('UTF-8');

to the top of your script to fall back to UTF-8 if an encoding error
is encountered.  This, of course, assumes that the non-MARC8 encoding
actually is UTF-8.

Let us know if that helps!

> --Jackie
>
> On Wed, 28 Mar 2007, Mike Rylander wrote:
>
> > On 3/28/07, Jackie Shieh <jshieh@umich.edu> wrote:
> >>
> >> I am looking at a set of 7000+ records-- 514th rec is a record
> >> that contains transliteration for Amharic (Ethiopian) on corporate
> >> body. MARC::Record does not have a map to it. See attached
> >> screen shot.
> >>
> >> utf8 "\xAE" does not map to Unicode at
> >> /usr/lib/perl5/5.8.6/i386-linux-thread-multi/Encode.pm line 166.
> >>
> >> Have you come across something like this?  How did you get
> >> around it?!  Thanks for your help!
> >
> > Looking at the code map for MARC8, it seems this record is in fact
> > MARC8 encoded.  We need to confirm what code of yours is using Perl's
> > Encode module,  but my guess is that it's a very old MARC::Charset
> > module.  Can you show us a simplified example script that exhibits
> > this behavior?
> >
> > TIA
> >
> > --
> > Mike Rylander
> > mrylander@gmail.com
> > GPLS -- PINES Development
> > Database Developer
> > http://open-ils.org
> >
>


-- 
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About