Skip to content

Adds support for yearly files#11

Closed
ribalba wants to merge 2 commits intomainfrom
adds-yearly-import
Closed

Adds support for yearly files#11
ribalba wants to merge 2 commits intomainfrom
adds-yearly-import

Conversation

@ribalba
Copy link
Copy Markdown
Member

@ribalba ribalba commented Mar 19, 2026

A first idea of what we talked about today.

I am not to happy about the way the code is structured now as there are always 2 checks but I don't see any other option if we want to keep the api clean and agnostic.

We will have to update the GMT simulation frontend not to check all the yearly providers when looking for the best location.

@ribalba ribalba requested a review from ArneTR March 19, 2026 17:57
Copy link
Copy Markdown
Member

@ArneTR ArneTR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found nothing major in the review but have annotated some remarks and questions.

I gotta say I found this PR very hard to review. It feels very much like an AI generated PR which includes so much encapsulation into mini functions and hard to read boilerplate (e.g. records_by_year: dict[int, list[dict]] = {} instead of records_by_year = {}) that I cannot comprehend the interplay of the functions as a whole.

This might be related bc I did not write the initial code. However I would argue to maybe make less boilerplated code

Comment thread elephant/app.py Outdated
zone: str,
carbon_intensity: float,
timestamp: datetime | None = None,
estimation: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this variable estimated. Makes it clearer to me that it is a bool and not the type of estimation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Refactored

Comment thread elephant/app.py
@@ -167,7 +190,12 @@ def _to_iso(dt: datetime) -> str:
return dt.astimezone(timezone.utc).isoformat().replace("+00:00", "Z")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Z really so common? Would UTC not be better?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this because EM uses this format
https://app.electricitymaps.com/developer-hub/api/reference#carbon-intensity-past

and I wanted to be compatible with the Electricity maps API

Comment thread migrations/02_add_carbon_yearly.sql Outdated
region TEXT NOT NULL,
carbon_intensity DOUBLE PRECISION NOT NULL,
provider TEXT NOT NULL,
estimation BOOLEAN NOT NULL DEFAULT TRUE,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can carbon yearly ever not be an estimation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from the other table and always set it to true. I refactored this to now set the field when the data is read from the DB

Comment thread elephant/database.py
def fetch_latest(conn: Connection, region: str) -> dict[str, dict]:
def fetch_latest(conn: Connection, region: str) -> list[dict]:
"""Return the most recent row for each provider at a region."""
with conn.cursor(row_factory=dict_row) as cur:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to have a dict_row factory here IMHO. could be normal list lookup

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But like this I get dicts that I can use? Otherwist I would need to build the data structure in an extra step? What is the advantage of this?

Comment thread elephant/database.py Outdated

SELECT DISTINCT region
FROM carbon_yearly
WHERE region IS NOT NULL
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the table definition the column is not null in both tables.

When is it ever null?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. removed

Comment thread elephant/database.py Outdated
Comment thread elephant/database.py

def _align_to_quarter_hour(dt: datetime) -> datetime:
"""Round a datetime up to the next 15-minute boundary."""
if dt.second == 0 and dt.microsecond == 0 and dt.minute % 15 == 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That check feels absurdely unlikely to hit. Better not have in the first place and just do the calculation always?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, doesn't always happen when someone selects the full hour?

Comment thread elephant/yearly_dataset.py Outdated
return json.loads(match.group(1))


def iter_yearly_dataset_records(data_dir: Path = YEARLY_DATA_DIR) -> Iterator[dict]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an iterator with all its embodied overhead if the only thing needed is the output and not a replacement in exisiting loop flows?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to return a list

Comment thread elephant/database.py
@ArneTR
Copy link
Copy Markdown
Member

ArneTR commented Mar 20, 2026

My user experience:

I had quite some trouble getting it to work, but I do like the final view. In favor of bringing that in.

Here my journey.

I ran python3 -m elephant.import_yearly but still I see nothing in the frontend.

Screenshot 2026-03-20 at 10 12 13 AM

Then I decided to completely trash the container and the DB with docker compose down -v and rebuild it docker compose up --build -d

This worked. but now I can see the categories, but no data shows ...
I found out this is realted bc we have no 2026 data ... but should the 2025 data not be used in 2026 if we have no better option?

Maybe we can just duplicate all 2025 data to 2026 for now? otherwise the feature has no real use I feel ...

Further quirks I found:

  • Typo in label
Screenshot 2026-03-20 at 10 17 37 AM - When selecting a new provider that has no data the old chart stays active making it unclear if something changed or data is just identical. The small notification at the bottom is not visible on my 13" screen without scrolling. I would argue for the chart to be removed on change always. In the example screenshot I am moving from germany to DJ which has no data for 2026 Screenshot 2026-03-20 at 10 21 06 AM

@ribalba
Copy link
Copy Markdown
Member Author

ribalba commented Mar 30, 2026

  • I moved the error box to the top
  • I added more information on the ranges
  • Fixed the typo
  • I am not a big fan of using the 2025 numbers for 2026 as this is just plainly wrong. The idea is that you can model things in the past and not make up random numbers. Let's call

You will need to recreate the DB again as I changed quite a few things

@ArneTR
Copy link
Copy Markdown
Member

ArneTR commented Apr 3, 2026

I have looked at it but cannot review it in full at the moment.

Somehow I feel that the yearly integration in this form provides quite a bad user experience

  • I am totally swamped with a lot of selection options where most of them do not matter to me usually
  • It is very hard to find the actual "live" data that is present between the noise of all yearly entries
  • When opening the elephant i see no data as the the first entry AE (yearly) has no 2026 data. This is quite a bad first experience
  • Now providers are merged. When I go to DE I see that it supports yearly and hourly. However when I take bigger times (like 2023-2026) I only get part of the live data clipped at the last 96 hours and nothing from 2023 ... somehow the mixing is not gapless?
Screenshot 2026-04-03 at 10 30 50 AM

One could make the argument though that the Elephant UI is more of a debug interface and for that I think it does a good job. Can only be used from someone who knows it quirks.

It should hover more intuitively integrate into GMT. I wanted to try that, but the integration is still broken :)

I am still experiencing the expand_to_sampling_rate() missing 1 required positional argument: 'df' error as reported in the PR green-coding-solutions/green-metrics-tool#1520

@ribalba
Copy link
Copy Markdown
Member Author

ribalba commented Apr 14, 2026

As discussed I will close this PR as I don't think we can really "save" anything here. It will always be an hack. The added gain from this is nothing compared to the complexity.

@ribalba ribalba closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants