From: Doran, Michael D
Hi Chris,
> I'll try that version.
I sure hope you meant upgrading to Perl 5.8.2 (or higher) rather than downgrading to MARC::Record 1.39_02. ;-)
This is just my un-asked for 2 cents, but I wouldn't stint on anything that will make the processing of Unicode-encoded text easier. Last December seemed to mark a tipping point for Unicode, both on the internet:
"Just last December [2007] there was an interesting milestone
on the web. For the first time, we found that Unicode was the
most frequent encoding found on web pages, overtaking both
ASCII and Western European encodings" [1]
...as well as for its use in MARC records:
"To facilitate the movement of records between MARC-8 and Unicode
environments, it was recommended for an initial period that the use of
Unicode be restricted to a repertoire identical in extent to the MARC-8
repertoire. [...] however, such a restriction is no longer appropriate.
The full UCS repertoire, as currently defined at the Unicode web site,
is valid for encoding MARC 21 records subject only to the constraints
described [in the current MARC 21 Specifications]." [2]
-- Michael
[1] The Official Google Blog: "Moving to Unicode 5.1"
http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html
[2] MARC 21 Specifications: Unicode Encoding Environment
(revised December 2007)
http://www.loc.gov/marc/specifications/speccharucs.html
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# doran@uta.edu
# http://rocky.uta.edu/doran/
> -----Original Message-----
> From: Christopher Morgan [mailto:morgan@acm.org]
> Sent: Tuesday, July 08, 2008 2:12 PM
> To: 'Bryan Baldus'; perl4lib@perl.org
> Subject: RE: Problem installing MARC::Record 2.0.0 under perl 5.8.0
>
> Brian,
>
> Thanks very much. I'll try that version.
>
> - Chris
>
>
> -----Original Message-----
> From: Bryan Baldus [mailto:bryan.baldus@quality-books.com]
> Sent: Tuesday, July 08, 2008 2:31 PM
> To: Christopher Morgan; perl4lib@perl.org
> Subject: RE: Problem installing MARC::Record 2.0.0 under perl 5.8.0
>
> On Tuesday, July 08, 2008 12:35 PM, Christopher Morgan wrote:
> >I am in the process of rebuilding my web site after a phishing site
> >break-in (yikes!). The site is fine now, and secure, but for some
> >reason I can't get MARC::Record-2.0.0 to install. I get an error
> >message saying that perl 5.8.2 is required, but that I only have perl
> >5.8.0. (And indeed I do have perl
> 5.8.0) But I'm pretty sure this version of MARC::Record *did* install
> under
> perl 5.8.0 that last time I tried.<
>
> MARC::Record 1.39_02 appears to be the latest version on CPAN that would
> work on 5.8.0. MARC::Record 2.x is incompatible with pre-5.8.2 versions of
> Perl due to Unicode-related changes. The change was announced in a
> Perl4Lib
> message "MARC::Record v2.0 RC1", sent Fri 5/20/2005 2:35 PM, by Ed
> Summers.
> [1]
>
> [1] <http://www.nntp.perl.org/group/perl.perl4lib/2005/05/msg2070.html>
>
> I hope this helps,
>
> Bryan Baldus
> bryan.baldus@quality-books.com
> eijabb@cpan.org
> http://home.inwave.com/eija
From: Christopher Morgan
Brian,
Thanks very much. I'll try that version.
- Chris
-----Original Message-----
From: Bryan Baldus [mailto:bryan.baldus@quality-books.com]
Sent: Tuesday, July 08, 2008 2:31 PM
To: Christopher Morgan; perl4lib@perl.org
Subject: RE: Problem installing MARC::Record 2.0.0 under perl 5.8.0
On Tuesday, July 08, 2008 12:35 PM, Christopher Morgan wrote:
>I am in the process of rebuilding my web site after a phishing site
>break-in (yikes!). The site is fine now, and secure, but for some
>reason I can't get MARC::Record-2.0.0 to install. I get an error
>message saying that perl 5.8.2 is required, but that I only have perl
>5.8.0. (And indeed I do have perl
5.8.0) But I'm pretty sure this version of MARC::Record *did* install under
perl 5.8.0 that last time I tried.<
MARC::Record 1.39_02 appears to be the latest version on CPAN that would
work on 5.8.0. MARC::Record 2.x is incompatible with pre-5.8.2 versions of
Perl due to Unicode-related changes. The change was announced in a Perl4Lib
message "MARC::Record v2.0 RC1", sent Fri 5/20/2005 2:35 PM, by Ed Summers.
[1]
[1] <http://www.nntp.perl.org/group/perl.perl4lib/2005/05/msg2070.html>
I hope this helps,
Bryan Baldus
bryan.baldus@quality-books.com
eijabb@cpan.org
http://home.inwave.com/eija
From: Bryan Baldus
On Tuesday, July 08, 2008 12:35 PM, Christopher Morgan wrote:
>I am in the process of rebuilding my web site after a phishing site break-in (yikes!). The site is fine now, and secure, but for some reason I can't get MARC::Record-2.0.0 to install. I get an error message saying that perl 5.8.2 is required, but that I only have perl 5.8.0. (And indeed I do have perl
5.8.0) But I'm pretty sure this version of MARC::Record *did* install under perl 5.8.0 that last time I tried.<
MARC::Record 1.39_02 appears to be the latest version on CPAN that would work on 5.8.0. MARC::Record 2.x is incompatible with pre-5.8.2 versions of Perl due to Unicode-related changes. The change was announced in a Perl4Lib message "MARC::Record v2.0 RC1", sent Fri 5/20/2005 2:35 PM, by Ed Summers. [1]
[1] <http://www.nntp.perl.org/group/perl.perl4lib/2005/05/msg2070.html>
I hope this helps,
Bryan Baldus
bryan.baldus@quality-books.com
eijabb@cpan.org
http://home.inwave.com/eija
From: Christopher Morgan
I am in the process of rebuilding my web site after a phishing site break-in
(yikes!). The site is fine now, and secure, but for some reason I can't get
MARC::Record-2.0.0 to install. I get an error message saying that perl 5.8.2
is required, but that I only have perl 5.8.0. (And indeed I do have perl
5.8.0) But I'm pretty sure this version of MARC::Record *did* install under
perl 5.8.0 that last time I tried.
I cheated by changing line 2 in the Makefile.PL file to read "require
perl-5.8.0" instead of "5.8.2". It installed, but it only passed about 20%
of the tests during make test. Am I asking for trouble here? Will it work,
or should I try installing an earlier version? (If so, which earlier
version, and where should I get it?) Also, I saw a patch somewhere that you
could use if you're installing into systems that use Perl 5.00xxx or earlier
(or something to that effect).
Any thoughts from anyone on this?
Many thanks!
- Chris
From: Emmanuel Di Pretoro
Hi,
Is there anybody who is already involved in the process of cleaning a MARC
file. This means:
- fusion multiple records into one single record;
- or keep one record, and delete the others.
Can you describe your methodology, as well as used algorithms.
Thanks in advance.
Regards,
Emmanuel Di Pretoro
From: md
I have the raw data files of the former Hennepin County Library
catalog and authority files.
This is the innovative, unique catalog created
by Sandy Berman. 1970s-2002.
I would like to import the data into a MYSQL database. I assume
this can be done with Perl, but don't know if an existing parser
would work or if a custom program would be needed.
I have no programming skills. There must be someone...
here who knows and values Berman's work and is ready,
willing and able to devote their knowledge
and skills to making it accessible once again.
Please contact me with questions on or off list.
Thank You!
Madeline Douglass
mdougla@pclink.com
http://www.sanfordberman.org
From: Christopher Morgan
Harrison,
That's useful information. Yes, I'll only be doing lookups, which simplifies
things quite a bit. Given what you said about the Movable Type software, I
assume DB_FILE would be a good way to keep track of website user names,
passwords, cookies, and the like?
- Chris
-----Original Message-----
From: vagrantscholar@gmail.com [mailto:vagrantscholar@gmail.com] On Behalf
Of Harrison Dekker
Sent: Friday, June 20, 2008 2:19 PM
To: Christopher Morgan
Subject: Re: Practicality of using DB_File on a Perl-based book site?
Chris,
I'm no expert, but it seems to me, that there should be less overhead using
Berkeley DB compared to a relational DB, assuming that all you're doing is
lookups. If you've got a bunch of post processing going on involving
multiple large retrieval sets then you'll probably lose that edge, but
that's only because your perl code would be doing the work that a more
optimized SQL engine could be doing. SQL doesn't give you any improvement,
however, when all you're doing is a key/value type lookup.
Movable Type blog software uses BDB, at least it did in the past, and as far
as I know it's quite reliable/scalable. I use BDB for one web servicey type
application and I do have to throttle my requests if I'm sending them in
batch, but the db isn't the bottleneck, it's apache or the php xml functions
I use.
-Harrison
On Fri, Jun 20, 2008 at 10:42 AM, Christopher Morgan <morgan@acm.org> wrote:
>
> I'm designing a web site that will display MARC authority files
> onscreen. I use a Perl hash that's tied to a (read-only) Berkeley
> DB_file, and it works nicely. How practical is this approach if
> there's going to be moderate traffic on a site?
>
> My DB_FILE is about 200MB, but of course Perl brings only small pieces
> of the database into memory at any one time. Would the site bog down
> if people were accessing records at the rate of, say, every few
> seconds? Should I consider mySQL instead? I'd prefer to stick to
> DB_FILE, since it's so easy and elegant -- and I can easily create complex
data structures.
>
> What if one of my data files was significantly bigger (say, a GB or
> two of MARC book records)? I don't have a feel for the pros and cons
> of the various approaches to accessing large databases using Perl, but
> tied hashes are pretty fast! In any case, I know I'll have to lock the
> file during each read, via "flock" or the like. I haven't tried
implementing the latter yet.
>
> Does anyone have any ideas about this? Are there other Perl forums I
> should investigate regarding this topic?
>
> Many thanks!
>
> - Chris Morgan
>
>
--
Harrison Dekker -- Coordinator of Data Services -- UC Berkeley Libraries
510-642-8095 :: GTalk:vagrantscholar :: AIM:hdekker :: Meebo:ucbdekker
http://sunsite.berkeley.edu/wikis/datalab/
------------------------
Q: Why is this email 5 sentences or less?
A: http://five.sentenc.es
From: Christopher Morgan
I'm designing a web site that will display MARC authority files onscreen. I
use a Perl hash that's tied to a (read-only) Berkeley DB_file, and it works
nicely. How practical is this approach if there's going to be moderate
traffic on a site?
My DB_FILE is about 200MB, but of course Perl brings only small pieces of
the database into memory at any one time. Would the site bog down if people
were accessing records at the rate of, say, every few seconds? Should I
consider mySQL instead? I'd prefer to stick to DB_FILE, since it's so easy
and elegant -- and I can easily create complex data structures.
What if one of my data files was significantly bigger (say, a GB or two of
MARC book records)? I don't have a feel for the pros and cons of the various
approaches to accessing large databases using Perl, but tied hashes are
pretty fast! In any case, I know I'll have to lock the file during each
read, via "flock" or the like. I haven't tried implementing the latter yet.
Does anyone have any ideas about this? Are there other Perl forums I should
investigate regarding this topic?
Many thanks!
- Chris Morgan
From: Mike Rylander
On Wed, Jun 18, 2008 at 1:12 PM, Christopher Morgan <morgan@acm.org> wrote:
> Mike,
>
> I tried both of your suggested fixes (changing Name to LocalName, and
> running the updated patch), but no luck. I still get no error messages in
> the error log, but the program silently fails to print a report. If I
> manually remove the "mx:" namespace strings from all the tags, I can process
> the files with no problem. (So one quick fix would be to simply run these
> records through a quick search and replace routine.)
>
> Regarding the problem name authority files. They're all available on the web
> from OCLC's experimental name authority service, at
> http://alcme.oclc.org/eprintsUK/index.html
>
> You enter an author name (I entered "Robert Benchley"). Then I clicked on
> the first link at http://errol.oclc.org/laf/n50-7168.html
>
> Finally, I clicked on the second link ("XML Record") to get this link:
> http://errol.oclc.org/laf/n50-7168.MarcXML All of these have the "mx:"
> namespace notation in their tags.
Thanks. I will see if I can fix this on my installation, but since
the LocalName (only) change did not work for you I have suspicions
about the particular XML parser that's being chosen for the SAX part
on your system. The pure-perl parser (in some versions) did not
support namespaces well, and expat can be quirky as well.
I'll let you know what I find, and thanks for testing.
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker@esilibrary.com
| web: http://www.esilibrary.com
From: Christopher Morgan
Mike,
I tried both of your suggested fixes (changing Name to LocalName, and
running the updated patch), but no luck. I still get no error messages in
the error log, but the program silently fails to print a report. If I
manually remove the "mx:" namespace strings from all the tags, I can process
the files with no problem. (So one quick fix would be to simply run these
records through a quick search and replace routine.)
Regarding the problem name authority files. They're all available on the web
from OCLC's experimental name authority service, at
http://alcme.oclc.org/eprintsUK/index.html
You enter an author name (I entered "Robert Benchley"). Then I clicked on
the first link at http://errol.oclc.org/laf/n50-7168.html
Finally, I clicked on the second link ("XML Record") to get this link:
http://errol.oclc.org/laf/n50-7168.MarcXML All of these have the "mx:"
namespace notation in their tags.
- Chris
From: Mike Rylander
On Tue, Jun 10, 2008 at 1:18 PM, Christopher Morgan <morgan@acm.org> wrote:
> Mike,
>
> Sorry. Since my last post, I did find out how to use the UNIX patch command,
> and applied your patch to SAX.pm. My script still doesn't work, and there
> are no error messages. My earlier script (which worked on the subject
> authority file) now does not work, so I'm wondering if something in the
> patch may be causing this. I have a backup of the SAX.pm file in any case.
>
Well, it turns out I left something out of the patch I sent before.
In the end_element sub, the second line should be
my $name = $element->{ LocalName };
instead of
my $name = $element->{ Name };
If you would, you can just edit the installed version of the patched
SAX.pm to test.
The next thing to try would be to remove the namespace test, but leave
the LocalName changes in place. Anecdotal evidence suggests that some
of the more popular XML parsing engines, or at least the Perl bindings
for them, have problems with namespaces. I've attached a (complete,
arg!) patch that implements just the LocalName changes and would be
applied to the original version of SAX.pm.
If you don't have time to test all this that's fined, but if not would
you be willing to send a couple of your problem records?
Thanks Christopher,
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker@esilibrary.com
| web: http://www.esilibrary.com
From: Christopher Morgan
Mike,
Many thanks. My apologies, but I've never applied a Perl patch before, so
I'm not sure of the correct procedure. I did locate the SAX.pm file.
- Chris
-----Original Message-----
From: Mike Rylander [mailto:mrylander@gmail.com]
Sent: Tuesday, June 10, 2008 11:57 AM
To: Christopher Morgan
Cc: jtgorman@uiuc.edu; perl4lib@perl.org
Subject: Re: Can't parse MARC Authority XML files with mx: prefixes in their
tags
On Mon, Jun 9, 2008 at 5:39 PM, Christopher Morgan <morgan@acm.org> wrote:
> Jonathan,
>
> Many thanks. I get no errors on the command line or in the error log
> when I run the script. The file just executes with no output. If you
> have the time to run it, I've included the scriupt below, and have
> attached the name authority record it tries to process:
The problem is that the SAX parser is looking for the element Name instead
of LocalName. I've attached a patch that tests both LocalName and
NamespaceURI. If you could apply this to your version of MARC/File/SAX.pm
and give it a test, and it works for you, I'll commit it to the CVS repo.
--miker
From: Christopher Morgan
Mike,
Sorry. Since my last post, I did find out how to use the UNIX patch command,
and applied your patch to SAX.pm. My script still doesn't work, and there
are no error messages. My earlier script (which worked on the subject
authority file) now does not work, so I'm wondering if something in the
patch may be causing this. I have a backup of the SAX.pm file in any case.
- Chris
From: Mike Rylander
On Mon, Jun 9, 2008 at 5:39 PM, Christopher Morgan <morgan@acm.org> wrote:
> Jonathan,
>
> Many thanks. I get no errors on the command line or in the error log when I
> run the script. The file just executes with no output. If you have the time
> to run it, I've included the scriupt below, and have attached the name
> authority record it tries to process:
The problem is that the SAX parser is looking for the element Name
instead of LocalName. I've attached a patch that tests both LocalName
and NamespaceURI. If you could apply this to your version of
MARC/File/SAX.pm and give it a test, and it works for you, I'll commit
it to the CVS repo.
--miker
>
> #! /usr/bin/perl
> use strict;
>
> use MARC::Record;
> use MARC::Batch;
> use MARC::File::XML;
> use constant MAX => 20;
>
> MARC::File::XML->default_record_format('UNIMARCAUTH');
> my $batch = MARC::Batch->new( 'XML', 'name_authority_file');
> while (my $record = $batch->next()) {
> for my $field ($record->field("100")){
> my $name= $field->subfield('a');
> print "$name", "\n";
> }
> }
>
> I think you're right about the LOC files -- they probably got the extra
> spaces by accident. That's easy enough to fix.
>
> As far as the name authorities go, if I can't get MARC::File::XML to process
> them, I can always use XML::Tokeparser. Not as elegant, but it would get the
> job done.
>
> - Chris
>
> -----Original Message-----
> From: Jonathan Gorman [mailto:jtgorman@uiuc.edu]
> Sent: Monday, June 09, 2008 4:43 PM
> To: Christopher Morgan; perl4lib@perl.org
> Subject: Re: Can't parse MARC Authority XML files with mx: prefixes in their
> tags
>
>
>
>>However, I'm having trouble parsing the name authority records online
>>at http://alcme.oclc.org/eprintsUK/index.html
>
> [snipped code examples]
>>
>>There are "mx:" prefixes in all the tags. What format is this? Is there
>>any way I can get MARC::File::XML to parse these files?
>
> The prefixes are the namespace. The parser should be able to handle this,
> but I don't honestly know if it does it correctly. What also might be the
> problem is the second namespace in there. It might help us if you included
> some information about what is not working (what error are you getting etc).
> I don't have the time right now to run my own test, but actual error
> messages might provide some clue.
>
>>A related question: When I first tried to process the subject authority
>>files from the LOC (in my first example, above), the program complained
>>that the "Leader must be 24 bytes long".
>
> Right, that comes from the MARC specification, there are 24 bytes.
>
>>XML files are five years old. I wonder if the XML spec has changed
>>since
>>then?)
>
> Doubt it, again it doesn't have anything really to do with the XML spec but
> the underlying xml record. More likely it is some error in creating the
> files. Can't give any more info though, sorry.
>
> Jon Gorman
>
--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker@esilibrary.com
| web: http://www.esilibrary.com
From: Christopher Morgan
Jonathan,
Many thanks. I get no errors on the command line or in the error log when I
run the script. The file just executes with no output. If you have the time
to run it, I've included the scriupt below, and have attached the name
authority record it tries to process:
#! /usr/bin/perl
use strict;
use MARC::Record;
use MARC::Batch;
use MARC::File::XML;
use constant MAX => 20;
MARC::File::XML->default_record_format('UNIMARCAUTH');
my $batch = MARC::Batch->new( 'XML', 'name_authority_file');
while (my $record = $batch->next()) {
for my $field ($record->field("100")){
my $name= $field->subfield('a');
print "$name", "\n";
}
}
I think you're right about the LOC files -- they probably got the extra
spaces by accident. That's easy enough to fix.
As far as the name authorities go, if I can't get MARC::File::XML to process
them, I can always use XML::Tokeparser. Not as elegant, but it would get the
job done.
- Chris
-----Original Message-----
From: Jonathan Gorman [mailto:jtgorman@uiuc.edu]
Sent: Monday, June 09, 2008 4:43 PM
To: Christopher Morgan; perl4lib@perl.org
Subject: Re: Can't parse MARC Authority XML files with mx: prefixes in their
tags
>However, I'm having trouble parsing the name authority records online
>at http://alcme.oclc.org/eprintsUK/index.html
[snipped code examples]
>
>There are "mx:" prefixes in all the tags. What format is this? Is there
>any way I can get MARC::File::XML to parse these files?
The prefixes are the namespace. The parser should be able to handle this,
but I don't honestly know if it does it correctly. What also might be the
problem is the second namespace in there. It might help us if you included
some information about what is not working (what error are you getting etc).
I don't have the time right now to run my own test, but actual error
messages might provide some clue.
>A related question: When I first tried to process the subject authority
>files from the LOC (in my first example, above), the program complained
>that the "Leader must be 24 bytes long".
Right, that comes from the MARC specification, there are 24 bytes.
>XML files are five years old. I wonder if the XML spec has changed
>since
>then?)
Doubt it, again it doesn't have anything really to do with the XML spec but
the underlying xml record. More likely it is some error in creating the
files. Can't give any more info though, sorry.
Jon Gorman
From: Jonathan Gorman
>However, I'm having trouble parsing the name authority records online at
>http://alcme.oclc.org/eprintsUK/index.html
[snipped code examples]
>
>There are "mx:" prefixes in all the tags. What format is this? Is there any
>way I can get MARC::File::XML to parse these files?
The prefixes are the namespace. The parser should be able to handle this, but I don't honestly know if it does it correctly. What also might be the problem is the second namespace in there. It might help us if you included some information about what is not working (what error are you getting etc). I don't have the time right now to run my own test, but actual error messages might provide some clue.
>A related question: When I first tried to process the subject authority
>files from the LOC (in my first example, above), the program complained that
>the "Leader must be 24 bytes long".
Right, that comes from the MARC specification, there are 24 bytes.
>XML files are five years old. I wonder if the XML spec has changed since
>then?)
Doubt it, again it doesn't have anything really to do with the XML spec but the underlying xml record. More likely it is some error in creating the files. Can't give any more info though, sorry.
Jon Gorman
From: Christopher Morgan
I have been successfully using MARC::File::XML to process MARC subject
authority files from the LOC, such as this sample record:
<?xml version="1.0" encoding="UTF-8" ?>
<collection xmlns="http://www.loc.gov/MARC21"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/MARC21
http://www.loc.gov/standards/marcxml/schema/MARC21.xsd">
<record type="Bibliographic">
<leader>00495cz 2200169n 4500</leader>
<controlfield tag="001">sh 00000014 </controlfield>
<controlfield tag="003">DLC </controlfield>
<controlfield tag="005">20000508151507.0 </controlfield>
<controlfield tag="008">000321i| anannbabn |a ana
</controlfield>
<datafield tag="010" ind1="" ind2="">
<subfield code="a">sh 00000014 </subfield>
</datafield>
<datafield tag="040" ind1="" ind2="">
<subfield code="a">DLC</subfield>
<subfield code="b">eng</subfield>
<subfield code="c">DLC </subfield>
</datafield>
<datafield tag="150" ind1="" ind2="">
<subfield code="a">Tacos </subfield>
</datafield>
</record>
The following script prints subfield "a" of tag 150:
MARC::File::XML->default_record_format('UNIMARCAUTH');
my $batch = MARC::Batch->new( 'XML', '../filename');
while (my $record = $batch->next()) {
for my $field ($record->field("150")){
my $name= $field->subfield('a');
print "$name", "\n";
}
}
However, I'm having trouble parsing the name authority records online at
http://alcme.oclc.org/eprintsUK/index.html
Here is part of one of these records (from
<http://errol.oclc.org/laf/n50-7168.MarcXML>
http://errol.oclc.org/laf/n50-7168.MarcXML):
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<mx:record xmlns:mx="http://www.loc.gov/MARC21/slim"
xmlns=http://www.w3.org/TR/xhtml1/strict
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<mx:leader>00000cz 2200000n 0000</mx:leader>
<mx:controlfield tag="001">oca00042708</mx:controlfield>
. . . . .
. . . . .
etc.
There are "mx:" prefixes in all the tags. What format is this? Is there any
way I can get MARC::File::XML to parse these files?
A related question: When I first tried to process the subject authority
files from the LOC (in my first example, above), the program complained that
the "Leader must be 24 bytes long". All the leader tags in the authority
files I got from the LOC have five trailing blank spaces at the end. I
manually removed the spaces to get the test files to work. I can always
preprocess the files to take out the trailing spaces, but I wonder if
there's a way around this with MARC::File::XML. (These LOC subject authority
XML files are five years old. I wonder if the XML spec has changed since
then?)
Many thanks for any help!
- Chris Morgan
From: Saiful Amin
#!/usr/bin/perl
#
# Name: ccf2marc.pl
# Author: Saiful Amin <saiful@edutech.com>
# Date: May 2008
# Version: 0.4
# Description: Takes the CDS/ISIS as input and gives valid MARC21 data as output.
#
use strict;
use warnings;
#use diagnostics;
use Biblio::Isis;
use MARC::Record;
# Usage Instructions
die "\nUSAGE: $0 Output_file\n" unless defined $ARGV[0];
# Open the ISIS Database
my $isis = new Biblio::Isis (
isisdb => 'C:/WINISIS/DATA/sample/',
#include_deleted => 0,
debug => 0
);
open (OUTFILE, ">$ARGV[0]");
my $num = 0;
###################################################
for (my $mfn = 1; $mfn <= $isis->count; $mfn++) {
my $marc = MARC::Record->new();
my $record = $isis->to_hash($mfn);
my $first_author_a = $$record{300}[0]{a};
my $first_author_b = $$record{300}[0]{b};
my $first_author_e = $$record{300}[0]{f};
my $corp_author_a = $$record{310}[0]{a};
my $corp_author_b = $$record{310}[0]{b};
my $corp_author_c = $$record{310}[0]{d};
my $corp_author_cc = $$record{310}[0]{e};
my $field_245a = $$record{200}[0]{a};
my $field_245c = $$record{200}[0]{b};
my $conf_author_a = $$record{320}[0]{a};
my $conf_author_cc = $$record{320}[0]{e};
my $conf_author_c = $$record{320}[0]{g};
my $conf_author_d = $$record{320}[0]{h};
my $conf_author_n = $$record{320}[0]{j};
$num++;
###################################################
# Create the Leader and map with relevant codes
###################################################
# Prepare the Leader
my $leader = '00054nam#a22002891a 4500';
$marc->leader($leader);
###################################################
# Create the fixed field tags (007/008)
###################################################
my $data_008 = '080528| r| ||111|eng||||';
my $tag_008 = MARC::Field->new('008', $data_008);
$marc->append_fields($tag_008);
#######################################################
# Create the first author (Main Entry/Added Entry)
#######################################################
# First author in tag_100
if ($first_author_a) {
my $first_author = "$first_author_a";
$first_author .= ", $first_author_b" if $first_author_b;
my $main_author = '';
if ($corp_author_a || $conf_author_a) {
$main_author = MARC::Field->new('700', 1,'', 'a' => '');
} else {
$main_author = MARC::Field->new('100', 1,'', 'a' => '');
}
$main_author->update('a' => $first_author);
$main_author->update('e' => $first_author_e) if $first_author_e;
$marc->append_fields($main_author);
}
#######################
## The Title Section ##
#######################
my $title = $field_245a;
my $state_of_resp = '';
if ($field_245c) {
$state_of_resp = $field_245c;
}
#print "$mfn\n" if !defined $title;
$title = "No title found" if !defined $title;
# Create Title field
my $tag_245 = MARC::Field->new('245',1,0,
'a' => "$title"
);
$tag_245->update('c' => $state_of_resp) if $state_of_resp;
$marc->append_fields($tag_245);
###################################################
# Write output to OUTFILE
###################################################
print OUTFILE $marc->as_usmarc();
print STDOUT "Printed record number $mfn\n";
}
close (OUTFILE);
From: Dobrica Pavlinusic
On Tue, May 27, 2008 at 01:03:22PM +0530, Saiful Amin wrote:
> Hi Dobrica,
>
> Thanks for quick reply.
>
> Could you please send me small sample of CDS/ISIS deleted records to take a
> > look?
>
>
> You can download the sample database of 50 records, in which MFN 18 and 19
> are logically deleted, from the following link:
> http://122.166.0.252/sample.zip
I can't reproduce your problem. When I try to dump your records using
dump_isisdb.pl included in Bibio::ISIS distribution (with options to start
at record 17, and dump just 4 records) I get:
$ ./scripts/dump_isisdb.pl -o 17 -l 4 data/sample/BOOKS. | grep ^0
0 17
0 20
which means that by default it dumped just record 17 and 20 skipping 18
and 19. If I add option -v which turn include_deleted on I get:
/Biblio-Isis$ ./scripts/dump_isisdb.pl -o 17 -l 4 -v data/sample/BOOKS. | grep ^0
0 17
0 18
0 19
0 20
as I would expect. Adding -d also shows that Bibio::ISIS correctly find
that MFN 18 and 19 are logically deleted.
I would love to help you with this, but I'm puzzled.
--
Dobrica Pavlinusic 2share!2flame dpavlin@rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
From: Saiful Amin
Hi Dobrica,
Thanks for quick reply.
Could you please send me small sample of CDS/ISIS deleted records to take a
> look?
You can download the sample database of 50 records, in which MFN 18 and 19
are logically deleted, from the following link:
http://122.166.0.252/sample.zip
Currently, I'm using the clumsy methods to purge these records (as suggested
by a CDS/ISIS user): export the records, re-initialize the database, and
import the records back. It would be nice if we can just ignore the
logically deleted records.
Thanks again.
Regards,
Saiful
From: Dobrica Pavlinusic
On Tue, May 27, 2008 at 11:10:40AM +0530, Saiful Amin wrote:
> Hi,
>
> I'm doing a crosswalk of CCF records (stored in CDS/ISIS) into MARC21 to
> import them into a modern ILS. I'm using Biblio::ISIS and MARC::Record for
> this purpose.
>
> If I understand correctly, CDS/ISIS only logically deletes a record and
> doesn't delete it permanently. Biblio::ISIS is not ignoring those logically
> deleted records. I've tried setting the 'include_deleted' ("Don't skip
> logically deleted records in ISIS") to 0, but it doesn't work.
>
> Any ideas?
Could you please send me small sample of CDS/ISIS deleted records to take a
look?
> I want to take this opportunity to thank authors of both the modules
> (Dobrica Pavlinusic and Andy Lester) for writing such amazing modules. I've
> been using them with great results for few years now.
You are welcomed. While we are at it, I must say that you are one of few
users of Bibio::ISIS that I know of :-)
--
Dobrica Pavlinusic 2share!2flame dpavlin@rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
From: Saiful Amin
Hi,
I'm doing a crosswalk of CCF records (stored in CDS/ISIS) into MARC21 to
import them into a modern ILS. I'm using Biblio::ISIS and MARC::Record for
this purpose.
If I understand correctly, CDS/ISIS only logically deletes a record and
doesn't delete it permanently. Biblio::ISIS is not ignoring those logically
deleted records. I've tried setting the 'include_deleted' ("Don't skip
logically deleted records in ISIS") to 0, but it doesn't work.
Any ideas?
I want to take this opportunity to thank authors of both the modules
(Dobrica Pavlinusic and Andy Lester) for writing such amazing modules. I've
been using them with great results for few years now.
Best regards,
Saiful
--
Saiful Amin
Project Lead
Edutech India Pvt Ltd
Bangalore, India.
+91 9343826438
From: Bryan Baldus
I have updated MARC::Errorchecks in CPAN, releasing version 1.14, and
have updated MARC::Lint in CVS on SourceForge. Changes for each are
listed below.
MARC::Errorchecks changes:
Version 1.14: Updated Oct. 21, 2007, Jan. 21, 2008, May 20, 2008.
Released May 25, 2008.
-Updated %ldrbytes with leader/19 per Update no. 8, Oct. 2007. Check
for validity of leader/19 not yet implemented.
-Updated _check_book_bytes with code '2' ('Offprints') for
008/24-27, per Update no. 8, Oct. 2007.
-Updated check_245ind1vs1xx($record) with TODO item and comments
-Updated check_bk008_vs_300($record) to allow "leaves of plates" (as
opposed to "leaves", when no p. or v. is present), "leaf", and
"column"(s).
-Updated test in Errorchecks.t to remove check for LCCN starting
with year greater than the current year. This was at 2008, which is
no longer later. A test may be implemented in the future that will be
less likely to break with the passage of time.
MARC::Lint changes:
- Updated _check_article with the exception 'A to '
- Updated Lint::DATA section with Update No. 8 (Oct. 2007)
############
Please let me know of any problems, suggestions, etc.
Thank you,
Bryan Baldus
bryan.baldus@quality-books.com
eijabb@cpan.org
http://home.inwave.com/eija
From: David Kaufman
Hi Michael,
"Doran, Michael D" <doran@uta.edu> wrote:
> I'm trying to strip out combining diacritics from some form input using
> this code:
> [...]
> $sans_diacritics =~ s/\p{M}*//g;
I do it like this:
use Encode;
use Unicode::Normalize qw(normalize);
my $ascii = encode('ascii', normalize('KD', $utf8), sub { $_[0]='' });