Skip to content

Conversation

@VeniVidiVici
Copy link

Hi,
When parsing Iso2709 records toString is called with utf8 hard coded in. We are sometimes supplied files with strings in latin1.

I have been unable to find a way to reliably detect string encoding when parsing, this patch makes it an optional parameter when creating the stream. There may be a better way to handle this but this seems to be working fine.

Chris

@fredericd
Copy link
Owner

Hi,

Thks for your feedback. Your code looks good. What is your use case? Could you document how you use marcjs library with your new API? To integrate your code, I'd need tests and a least one ISO2709 file in Latin1.

I'm reluctant to extend this library to deal with non utf8 ISO2709 files. Those kind of files belong to the past... and could be handled and transformed into utf8 with other tools.

@VeniVidiVici
Copy link
Author

Hi,
We supply apps and other services to libraries, this involves importing their catalog for search indexing, this is where we use marcjs. So we often have no control over the format of the files we are give, also the expertise is not always available on the other side to understand what character encoding is or how to change it.

The data we are importing does not belonging to us but I will asked a friendly contact and one of the libraries if they can supply m with a few records to share. (most imports are more than a GB in size)

I have in the past tried to transform the data between reading the file and passing it to marcjs but never got it working, this was more than a year ago so I don't remember the specifics. Maybe with he recent updates this could now be possible again.

@fredericd
Copy link
Owner

Do you have a sample file? I'm currently working on the module and may add your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants