Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

Struggling to export updated CDB after adding new concepts in MedCAT v1 (UMLS Dutch v1.10) #261

@LeJudith

Description

@LeJudith

Description:

I am working on annotation of UMLS concepts for Dutch pathology reports. I am using MedCAT v1 together with the pretrained UMLS Dutch v1.10 modelpack (UMC Utrecht).

Steps I took:

  • Created a project using only the modelpack (not vocab.dat and cdb.dat separately).
  • I could add annotations by highlighting a text span and using Add term.
  • I cannot use the Add concept button

Error when using Add concepts:
^^^^^^^^^^^^^^^^^^^^^ File "/home/api/api/views.py", line 440, in add_concept ensure_concept_searchable(cui, cat.cdb, project.concept_db) File "/home/api/api/solr_utils.py", line 172, in ensure_concept_searchable collection = f'{cdb_model.name}_id_{cdb_model.id}' ^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'name'

My Guess: this happens because no CDB object was explicitly set when only loading the modelpack.

  • Created a new project using only CDB + VOCAB (not modelpack).
  • This time I was able to add annotations and also add new concepts.
  • Now I want to export the annotations and the updated CDB.
  • Downloading annotations works but downloading the CDB does not work

The save_models button is not available on my project dashboard( likely because I did not load a full modelpack, only CDB + VOCAB) for this project.

I tried to manually copy the CDB file from the container:

docker cp <api-container-name>:/home/api/media/<path-to-your-cdb.dat> ./cdb.dat

But when I tried to create a new modelpack with this downloaded CDB, I got this error:

504 if not isinstance(cdb, CDB): 505 raise ValueError(f"The path '{path}' is not a CDB!") File /opt/conda/envs/medcat/lib/python3.10/site-packages/medcat/storage/serialisers.py:354, in deserialise(folder_path, ignore_folders_prefix, ignore_folders_suffix, **init_kwargs) 339 """Deserialise contents of a folder. 340 341 Extra init keyword arguments can be provided if needed. (...) 351 Serialisable: The deserialised object. 352 """ 353 # if manually serialised, do manually deserialisation --> 354 man_cls_path = Serialiser.get_manually_serialised_path(folder_path) 355 if man_cls_path: 356 return Serialiser.deserialise_manually(folder_path, man_cls_path, ... ---> 77 with open(file_path) as f: 78 contents = f.read() 79 matched = MANUAL_SERIALISED_RE.match(contents) FileNotFoundError: [Errno 2] No such file or directory: './models/cdb_Mfmetyc.dat/.serialised_by'

My Questions

  • Is it correct that Add concept does not work when only loading a modelpack?
  • What is the correct way to export an updated CDB (with new concepts) when the project was created using only cdb.dat + vocab.dat (not a modelpack)?

Hope you can help me! Thanks so much in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions