text encoding option for Iso2709 files #24

VeniVidiVici · 2018-07-03T17:32:23Z

Hi,
When parsing Iso2709 records toString is called with utf8 hard coded in. We are sometimes supplied files with strings in latin1.

I have been unable to find a way to reliably detect string encoding when parsing, this patch makes it an optional parameter when creating the stream. There may be a better way to handle this but this seems to be working fine.

Chris

fredericd · 2018-07-04T06:59:19Z

Hi,

Thks for your feedback. Your code looks good. What is your use case? Could you document how you use marcjs library with your new API? To integrate your code, I'd need tests and a least one ISO2709 file in Latin1.

I'm reluctant to extend this library to deal with non utf8 ISO2709 files. Those kind of files belong to the past... and could be handled and transformed into utf8 with other tools.

VeniVidiVici · 2018-07-06T08:50:17Z

Hi,
We supply apps and other services to libraries, this involves importing their catalog for search indexing, this is where we use marcjs. So we often have no control over the format of the files we are give, also the expertise is not always available on the other side to understand what character encoding is or how to change it.

The data we are importing does not belonging to us but I will asked a friendly contact and one of the libraries if they can supply m with a few records to share. (most imports are more than a GB in size)

I have in the past tried to transform the data between reading the file and passing it to marcjs but never got it working, this was more than a year ago so I don't remember the specifics. Maybe with he recent updates this could now be possible again.

fredericd · 2021-01-04T20:25:34Z

Do you have a sample file? I'm currently working on the module and may add your code.

…dericd-master

Update Marcxml parser library to ignore self closing XML tags

VeniVidiVici added 3 commits July 3, 2018 17:51

add encoding paramater to parse

1ca0997

ok, really add an encoding option this time

667204a

missed a spot

9b082db

VeniVidiVici and others added 6 commits August 5, 2021 12:10

Merge branch 'master' of https://github.com/fredericd/marcjs into fre…

0cb415f

…dericd-master

add support for changing string ecoding format

c84ba83

Merge branch 'fredericd-master'

0691aea

Update marc parser library to ignore self closing XML tags

3fd3a80

Update test/test-xml.js

0c955e3

Merge pull request #3 from communico/feature/COM-10974

07555b1

Update Marcxml parser library to ignore self closing XML tags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

text encoding option for Iso2709 files #24

text encoding option for Iso2709 files #24

Uh oh!

VeniVidiVici commented Jul 3, 2018

Uh oh!

fredericd commented Jul 4, 2018

Uh oh!

VeniVidiVici commented Jul 6, 2018

Uh oh!

fredericd commented Jan 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

text encoding option for Iso2709 files #24

Are you sure you want to change the base?

text encoding option for Iso2709 files #24

Uh oh!

Conversation

VeniVidiVici commented Jul 3, 2018

Uh oh!

fredericd commented Jul 4, 2018

Uh oh!

VeniVidiVici commented Jul 6, 2018

Uh oh!

fredericd commented Jan 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants