Skip to content

Fix crash in get_acron_issueid_fname_without_extension when FILE_REGEX doesn't match#323

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-load-languages-error
Draft

Fix crash in get_acron_issueid_fname_without_extension when FILE_REGEX doesn't match#323
Copilot wants to merge 2 commits intomasterfrom
copilot/fix-load-languages-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 11, 2026

O que esse PR faz?

Corrige dois bugs encadeados em get_acron_issueid_fname_without_extension que causavam crash do script load_languages:

  • AttributeError: FILE_REGEX.search() retorna None para paths como ./argos/aceeed/n21/2545-8299-ACEEED-21-137.xml, e .group() é chamado sobre None
  • UnboundLocalError: o bloco except referencia file_id que nunca foi atribuído (erro ocorre antes da atribuição)
  • logger.error recebia (file_path, file_id) como tupla única em vez de argumentos separados
  • Bare except: substituído por except Exception:

O script agora loga o erro e continua processando os demais documentos.

Onde a revisão poderia começar?

processing/load_languages.py, função get_acron_issueid_fname_without_extension (~linha 93).

Como este poderia ser testado manualmente?

from processing.load_languages import get_acron_issueid_fname_without_extension

# Antes: crashava com UnboundLocalError
# Agora: retorna None e loga o erro
result = get_acron_issueid_fname_without_extension('./argos/aceeed/n21/2545-8299-ACEEED-21-137')
assert result is None

# Paths válidos continuam funcionando
result = get_acron_issueid_fname_without_extension('delta/v32n2/1678-460X-delta-32-02-00543.xml')
assert result == ['delta', 'v32n2', '1678-460x-delta-32-02-00543']

Teste unitário adicionado: test_get_acron_issueid_fname_without_extension_no_match.

Algum cenário de contexto que queira dar?

O regex FILE_REGEX = re.compile(r'serial.*.htm|.*.xml|.*.pdf') não corresponde a todos os file paths possíveis. Quando o match falha, a função deve retornar None graciosamente — o chamador (fulltexts) já trata esse caso na linha 315-317.

Screenshots

N/A

Quais são tickets relevantes?

Referências

N/A

Original prompt

This section details on the original issue you should resolve

<issue_title>Erro em load_languages</issue_title>
<issue_description>### Descrição do problema

Erro:

create index [[('aid', 1)], {'background': True}] (articles)
create index [[('version', 1)], {'background': True}] (articles)
create index [[('code', 1), ('collection', 1)], {'unique': True, 'background': True}] (articles)
create index [[('collection', 1), ('processing_date', 1)], {'background': True}] (articles)
06:42:24 - processing.load_languages - INFO - Loading languages for www.scielo.org.ar
06:42:24 - processing.load_languages - INFO - Using mode all_records True
06:42:24 - processing.load_languages - INFO - Loading static_pdf_files.txt from server www.scielo.org.ar
06:42:27 - processing.load_languages - INFO - Loading static_html_files.txt from server www.scielo.org.ar
06:42:28 - processing.load_languages - INFO - Loading static_xml_files.txt from server www.scielo.org.ar
/usr/local/lib/python3.10/site-packages/pymongo/collection.py:1696: UserWarning: use an explicit session with no_cursor_timeout=True otherwise the cursor may still timeout after 30 minut
es, for more info see https://docs.mongodb.com/v4.4/reference/method/cursor.noCursorTimeout/#session-idle-timeout-overrides-nocursortimeout
  return Cursor(self, *args, **kwargs)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 95, in get_acron_issueid_fname_without_extension
    file_id = FILE_REGEX.search(_file_path).group()
AttributeError: 'NoneType' object has no attribute 'group'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/bin/articlemeta_loadlanguages", line 7, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 497, in main
    run(collections, articlemeta_db, args.all_records, args.domain)
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 409, in run
    fulltexts = static_catalogs.fulltexts(document)
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 313, in fulltexts
    file_id = self._file_id(document.file_code(fullpath=True))
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 239, in _file_id
    return get_acron_issueid_fname_without_extension(file_path)
  File "/usr/local/lib/python3.10/site-packages/processing/load_languages.py", line 109, in get_acron_issueid_fname_without_extension
    u'Fail to parse file_path %s for %s', (file_path, file_id))
UnboundLocalError: local variable 'file_id' referenced before assignment

ao consumir a lista

./argos/aceeed/n21/2545-8299-ACEEED-21-137.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-15.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-169.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-29.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-71.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-97.xml
./argos/aceeed/n21/2545-8299-ACEEED-21-207.xml

Sendo assim, corrigir o script para que se o padrão não for correspondido, registre a exceção, mas não interrompa a execução do script. No final indique os problemas encontrados.

</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ithout_extension

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix error in load_languages process Fix crash in get_acron_issueid_fname_without_extension when FILE_REGEX doesn't match Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Erro em load_languages

2 participants