-
Notifications
You must be signed in to change notification settings - Fork 20
[ingestor] Extended ical ingestor to .ics and Indico #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the iCalendar ingestor to support direct .ics links and Indico event/category URLs. The changes refactor the existing IcalIngestor class to handle multiple URL formats, extract reusable functionality into concerns, and add comprehensive test coverage for the new URL conversion logic.
Key Changes:
- Refactored
IcalIngestorto process direct.icsfiles, Indico/eventURLs, and Indico/categoryURLs - Extracted sitemap parsing logic into a reusable
SitemapHelpersconcern - Created
IcalIngestorExportUrlconcern to handle URL-to-export conversion for different URL formats
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| lib/ingestors/ical_ingestor.rb | Core refactoring: split monolithic methods into smaller, focused methods; integrated new concerns for sitemap and URL handling |
| lib/ingestors/concerns/sitemap_helpers.rb | New concern: extracts sitemap parsing (XML/TXT) into reusable module |
| lib/ingestors/concerns/ical_ingestor_export_url.rb | New concern: handles conversion of various URL formats to their .ics export equivalents |
| test/unit/ingestors/ical_ingestor_test.rb | Added comprehensive tests for error handling, URL conversion logic, and edge cases |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
lib/ingestors/ical_ingestor.rb
Outdated
| def process_description(input) | ||
| return input if input.nil? | ||
|
|
||
| desc = input.to_s.gsub('', '<br />') |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gsub pattern is empty (''), which will not match anything. This should likely be /\\R/ to match line breaks, as in the original implementation.
| desc = input.to_s.gsub('', '<br />') | |
| desc = input.to_s.gsub(/<br\s*\/?>/i, "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed – process_description is useless. I deleted it.
lib/ingestors/ical_ingestor.rb
Outdated
| events = Icalendar::Event.parse(open_url(file_url, raise: true).set_encoding('utf-8')) | ||
| # Assigns to event: venue, online, city. | ||
| def assign_location_info(event, location) | ||
| return if location.blank? || !location.present? |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition location.blank? || !location.present? is redundant. blank? and present? are opposites, so this simplifies to just return if location.blank?.
| return if location.blank? || !location.present? | |
| return if location.blank? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/ingestors/ical_ingestor.rb
Outdated
| [ | ||
| location['suburb'], | ||
| location['postcode'], | ||
| location['country'] | ||
| ] |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process_location method attempts to call location['suburb'] but location is a string (from calevent.location), not a hash. This will cause a TypeError. The original code called convert_location(calevent.location) which likely returned a hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the old code logic. I am now checking whether location is an array before returning this array of location variables, see changes here.
| def ingestor.assign_basic_info(*) | ||
| raise StandardError, 'test failure' | ||
| end | ||
|
|
||
| ingestor.send(:process_calevent, calevent) | ||
|
|
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Reopening the singleton class to stub a method is fragile. Consider using a mocking framework like Mocha or Minitest's stub mechanism for cleaner and more maintainable test doubles.
| def ingestor.assign_basic_info(*) | |
| raise StandardError, 'test failure' | |
| end | |
| ingestor.send(:process_calevent, calevent) | |
| ingestor.stub(:assign_basic_info, ->(*) { raise StandardError, 'test failure' }) do | |
| ingestor.send(:process_calevent, calevent) | |
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| uri.path = File.join(uri.path, 'event.ics') unless uri.path.end_with?('/event.ics') | ||
| elsif uri.path&.include?('category') | ||
| uri.path = File.join(uri.path, 'events.ics') unless uri.path.end_with?('/events.ics') |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using File.join can produce incorrect paths when uri.path ends with a trailing slash, as it may create paths like /event//event.ics. Consider using uri.path.chomp('/') + '/event.ics' or similar string manipulation to ensure proper path formatting.
| uri.path = File.join(uri.path, 'event.ics') unless uri.path.end_with?('/event.ics') | |
| elsif uri.path&.include?('category') | |
| uri.path = File.join(uri.path, 'events.ics') unless uri.path.end_with?('/events.ics') | |
| uri.path = uri.path.chomp('/') + '/event.ics' unless uri.path.end_with?('/event.ics') | |
| elsif uri.path&.include?('category') | |
| uri.path = uri.path.chomp('/') + '/events.ics' unless uri.path.end_with?('/events.ics') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true, when I register : https://indico.cern.ch/category/19377/ with trailing slash, export_url == https://indico.cern.ch/category/19377/events.ics no //...
|
f3f16b9 -> adds the possibility to use Indico API token to have an authenticated access to restricted events, only works for one Indico domain. |
fbacall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Indico ingestor should be a separate class, maybe a subclass of the Ical ingestor.
|
Tests are in the making |
fbacall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor change
|
|
||
| # Reads either a sitemap.{xml|txt} or a single URL | ||
| # Returns a list of URLs from 1 to n URLs | ||
| def get_sources(source_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this method to parse_sitemap?
Just so it doesn't get confused with TeSS' concept of "sources"
Summary of changes
lib/ingestors/ical_ingestor.rbI extended it to be able to ingest:.icslinks (e.g., https://indico.cern.ch/event/1596327/event.ics or https://indico.cern.ch/category/19377/events.ics)/eventlinks (e.g., https://indico.cern.ch/event/1596327/)/categorylinks (e.g., https://indico.cern.ch/category/19377/)Motivation and context
training.cern.ch needs indico ingestion
Screenshots
✅ 100% test covered

Checklist
to license it to the TeSS codebase under the
BSD license.