feat: add reusable CSV/XLS export utility and filtered download routes to journal page classes#1417
feat: add reusable CSV/XLS export utility and filtered download routes to journal page classes#1417
Conversation
…nal page classes - Create core/home/utils/export_journals.py with reusable get_scielo_journals_data, generate_csv_response, and generate_xls_response functions - Add RoutablePageMixin and download-csv/download-xls routes to ListPageJournal, ListPageJournalByPublisher, and ListPageJournalByCategory - Refactor existing standalone download views to use the utility functions - Update download_button_csv_xls.html template to use routablepageurl tags - Add tests for generate_csv_response and generate_xls_response - Update existing tests to import from new utility location Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com> Agent-Logs-Url: https://github.com/scieloorg/core/sessions/6c34ec03-b2a2-4ba6-b03f-32ba5018f3cc
- Create JournalDownloadMixin to eliminate code duplication across three page classes - Improve error messages in generate_csv_response and generate_xls_response to specify the format Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com> Agent-Logs-Url: https://github.com/scieloorg/core/sessions/6c34ec03-b2a2-4ba6-b03f-32ba5018f3cc
core/home/models.py
Outdated
| class JournalDownloadMixin: | ||
| @re_path(r"^download-csv/$", name="download_csv") | ||
| def download_csv(self, request): | ||
| journals_data = get_scielo_journals_data() |
There was a problem hiding this comment.
@copilot deve considerar que os itens foram filtrados usando os argumentos de request
There was a problem hiding this comment.
Fixed in 05eed2e. The download endpoints now respect request filters:
get_scielo_journals_data(filters=None)accepts optional Q filtersJournalDownloadMixinuses a template methodget_export_filters(request)— each page class overrides it to build the same filters used inget_contextListPageJournalByPublisher.get_export_filtersincludes publisher name search (matching the page's own filter logic)- Download template forwards current query params via
request.GET.urlencode
…ith_letter, tab) - get_scielo_journals_data() now accepts optional Q filters parameter - JournalDownloadMixin uses template method get_export_filters(request) - Each page class overrides get_export_filters with its own filter logic - Download template forwards current query parameters to download URLs - Added tests for filtered data retrieval Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com> Agent-Logs-Url: https://github.com/scieloorg/core/sessions/426fb4c6-9c21-4ab8-a3a9-21eaa4055951
There was a problem hiding this comment.
Pull request overview
This PR extracts the journals CSV/XLS export logic into a shared utility module and adds filtered download endpoints directly to journal listing page classes via a reusable Wagtail routable mixin, preserving current-page filters in the exported data.
Changes:
- Added
core/home/utils/export_journals.pywith helpers to fetch journal data and generate CSV/XLS download responses. - Added
JournalDownloadMixin+ routable download routes to journal list page classes, using per-page filter construction (get_export_filters). - Updated the download button include to call the page’s routable download endpoints and forward current querystring filters; refactored legacy standalone views to use the utility and added tests.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| core/home/utils/export_journals.py | New reusable utility to fetch/export journal data as CSV/XLS |
| core/home/models.py | Adds routable download endpoints + per-page export filtering hooks |
| core/home/views.py | Refactors legacy global download endpoints to reuse utility |
| core/templates/home/include/download_button_csv_xls.html | Switches links to routable endpoints and forwards current filters |
| core/home/tests.py | Adds/updates tests for export utility and filtered data retrieval |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_export_filters(self, request): | ||
| search_term = request.GET.get("search_term", "") | ||
| starts_with_letter = request.GET.get("start_with_letter", "") | ||
| active_or_discontinued = list(request.GET.get("tab", "")) | ||
| return default_journal_filter( | ||
| search_term, starts_with_letter, active_or_discontinued | ||
| ) |
There was a problem hiding this comment.
list(request.GET.get("tab", "")) turns a tab value like "active" into a list of characters, which will make Q(status__in=active_or_discontinued) effectively never match. Use request.GET.getlist("tab") (if the UI can send multiple values) or normalize a single tab value into a one-element list (e.g., [tab] if tab else []) before passing to default_journal_filter.
| def get_export_filters(self, request): | ||
| search_term = request.GET.get("search_term", "") | ||
| starts_with_letter = request.GET.get("start_with_letter", "") | ||
| active_or_discontinued = list(request.GET.get("tab", "")) | ||
| filters = Q(status__in=SCIELO_STATUS_CHOICES) | ||
| if search_term: | ||
| filters &= Q(journal__title__icontains=search_term) | Q( | ||
| journal__owner_history__institution__institution__institution_identification__name__icontains=search_term | ||
| ) | ||
| if starts_with_letter: | ||
| filters &= Q(journal__title__istartswith=starts_with_letter) | ||
| if active_or_discontinued: | ||
| filters &= Q(status__in=active_or_discontinued) | ||
| return filters |
There was a problem hiding this comment.
Same issue as above: list(request.GET.get("tab", "")) produces a character list and breaks the status__in filter. Switch to request.GET.getlist("tab") or wrap the single tab string into a list; otherwise the export for the “tab” filter will be incorrect.
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| HEADERS = ["journals", "scielo_url", "publisher"] |
There was a problem hiding this comment.
The exported column headers are journals/publisher, but the internal data keys are title/owner. This mismatch makes the utility harder to reuse correctly and increases the chance of future mistakes. Consider aligning the dict keys to the export contract (e.g., use journals and publisher everywhere) and then read those same keys in both generate_csv_response and generate_xls_response.
| { | ||
| "title": title, | ||
| "scielo_url": scielo_url, | ||
| "owner": owner, | ||
| } |
There was a problem hiding this comment.
The exported column headers are journals/publisher, but the internal data keys are title/owner. This mismatch makes the utility harder to reuse correctly and increases the chance of future mistakes. Consider aligning the dict keys to the export contract (e.g., use journals and publisher everywhere) and then read those same keys in both generate_csv_response and generate_xls_response.
| writer.writerow( | ||
| [journal.get("title"), journal.get("scielo_url"), journal.get("owner")] | ||
| ) |
There was a problem hiding this comment.
The exported column headers are journals/publisher, but the internal data keys are title/owner. This mismatch makes the utility harder to reuse correctly and increases the chance of future mistakes. Consider aligning the dict keys to the export contract (e.g., use journals and publisher everywhere) and then read those same keys in both generate_csv_response and generate_xls_response.
| "collection__domain", | ||
| "journal__owner_history__institution__institution__institution_identification__name", | ||
| "issn_scielo", | ||
| ) |
There was a problem hiding this comment.
Export rows can be duplicated if the joins (notably journal__owner_history...) produce multiple rows per SciELOJournal. This will lead to repeated lines in CSV/XLS exports. A concrete mitigation is applying distinct() to the values() queryset (e.g., qs.values(...).distinct()), or otherwise selecting a single “current” owner record before exporting.
| ) | |
| ).distinct() |
| <a class="dropdown-item" href="{% routablepageurl page 'download_xls' %}{% if request.GET.urlencode %}?{{ request.GET.urlencode }}{% endif %}">XLS</a> | ||
| </li> | ||
| <li> | ||
| <a class="dropdown-item" href="{% url 'download_csv_journals_page_scielo_org' %}">CSV</a> | ||
| <a class="dropdown-item" href="{% routablepageurl page 'download_csv' %}{% if request.GET.urlencode %}?{{ request.GET.urlencode }}{% endif %}">CSV</a> |
There was a problem hiding this comment.
These links now depend on a page variable being present in the template context. If this include is ever rendered outside a Wagtail Page context (or the calling template doesn’t pass page explicitly), it will error at render time. If this include is intended to be reusable, consider adding a safe fallback to the legacy global URLs when page is missing, or require/ensure page is explicitly provided by the caller.
| except Exception as e: | ||
| logger.error(f"Error fetching scielo journals data: {e}") | ||
| return [] |
There was a problem hiding this comment.
Logging the exception message without a traceback makes diagnosing production issues harder. Prefer logger.exception("Error fetching scielo journals data") (or logger.error(..., exc_info=True)) so the traceback is captured.
| @re_path(r"^download-csv/$", name="download_csv") | ||
| def download_csv(self, request): | ||
| filters = self.get_export_filters(request) | ||
| journals_data = get_scielo_journals_data(filters) | ||
| return generate_csv_response(journals_data) |
There was a problem hiding this comment.
The key new behavior is “export respects the same filters as the listing page” via get_export_filters(request) + routable endpoints. There are good unit tests for the export utility, but there’s no coverage shown for the routable download routes applying real request query params (e.g., search_term, start_with_letter, tab) end-to-end. Add an integration-style test that requests the page’s download-csv/ (and/or download-xls/) with query parameters and asserts the response contains only the filtered journal(s).
O que esse PR faz?
Extrai a lógica de exportação CSV/XLS de periódicos para um módulo utilitário reutilizável e adiciona endpoints de download diretamente nas três classes de página (
ListPageJournal,ListPageJournalByPublisher,ListPageJournalByCategory) viaRoutablePageMixine umJournalDownloadMixincompartilhado. Os downloads respeitam os filtros aplicados pelo usuário na página (search_term, start_with_letter, tab), exportando apenas os periódicos filtrados.Mudanças principais:
core/home/utils/export_journals.py(novo):get_scielo_journals_data(filters=None),generate_csv_response(),generate_xls_response()— funções reutilizáveis com colunasjournals,scielo_url,publisher.get_scielo_journals_dataaceita filtros Q opcionais para restringir o querysetcore/home/models.py:JournalDownloadMixincom rotasdownload-csv/edownload-xls/, usando template methodget_export_filters(request)que cada page class sobrescreve com sua lógica de filtro específica (incluindo busca por nome de publisher emListPageJournalByPublisher)core/home/views.py: views standalone refatoradas para usar o utilitário (mantidas para backward compatibility das URLs globais)download_button_csv_xls.html: usa{% routablepageurl page 'download_xls' %}comrequest.GET.urlencodepara encaminhar os parâmetros de filtro atuais às URLs de downloadcore/home/tests.py: testes paragenerate_csv_response,generate_xls_responsee filtragem de dados viaget_scielo_journals_dataOnde a revisão poderia começar?
core/home/utils/export_journals.py→core/home/models.py(classeJournalDownloadMixineget_export_filtersnas page classes) →core/templates/home/include/download_button_csv_xls.htmlComo este poderia ser testado manualmente?
journals,scielo_url,publisher/download-csv-journals-page-scielo-org/,/download-xls-journals-page-scielo-org/) continuam funcionandoAlgum cenário de contexto que queira dar?
A implementação anterior tinha lógica de exportação duplicada em view functions standalone e não considerava os filtros do request. O
JournalDownloadMixincentraliza os métodos de download e é herdado pelas três classes, eliminando duplicação. Cada page class sobrescreveget_export_filters(request)para aplicar os mesmos filtros usados na listagem da página ao export. O template de download encaminha os query parameters atuais (request.GET.urlencode) para que os filtros sejam preservados no download. As URLs globais emconfig/urls.pyforam mantidas para não quebrar links existentes.Screenshots
N/A — funcionalidade de download de arquivo, sem mudança visual além das URLs dos links.
Quais são tickets relevantes?
Desenvolver função para download de arquivo CSV com dados de periódicos.
Referências
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.