-
Notifications
You must be signed in to change notification settings - Fork 61
Groundwork for DB Version 3.1 and Unit Checking #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- adding timing to post-processing
- queues are now swapping data vs. models - table_writer.py broken up into data pulling portion/writing portion
…s at 20 threads ea.
…s at 20 threads ea.
…s at 20 threads ea.
- commented some functions - removed some erroneous "sizeof" measures
- was hanging while putting in shutdown signals
- works with Utopia - logging cleaned up
- works with Utopia - logging cleaned up
Added additional QA steps during data transfer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (5)
temoa/temoa_model/unit_checking/table_checker.py (1)
72-74: Prefer fullmatch for character validation.Use re.fullmatch instead of search with anchors for clarity.
- valid_chars = re.search(ACCEPTABLE_CHARACTERS, expr) + valid_chars = re.fullmatch(ACCEPTABLE_CHARACTERS, expr)temoa/temoa_model/unit_checking/screener.py (1)
34-34: Top-level import of PROJECT_ROOT breaks library import.Move
from definitions import PROJECT_ROOTinto the__main__block to avoid import errors when consuming as a package.-from definitions import PROJECT_ROOTIn the main block:
if __name__ == '__main__': + from definitions import PROJECT_ROOT db_path = Path(PROJECT_ROOT) / 'data_files/mike_US/US_9R_8D_v3_stability_v3_1.sqlite' screen(db_path, report_path=Path(PROJECT_ROOT) / 'output_files/')temoa/temoa_model/unit_checking/entry_checker.py (1)
83-87: Guard against SQL injection via table name (whitelist).Validate
tableagainst an allowlist before string-interpolating into SQL. The call sites use internal constants; enforce that here.-from temoa.temoa_model.unit_checking.common import ( - UnitsFormat, -) +from temoa.temoa_model.unit_checking.common import ( + UnitsFormat, + tables_with_units, +) @@ def gather_from_table(conn: sqlite3.Connection, table: str) -> dict[str, list[int]]: """gather all unique "units" entries from a table and collect the row indices""" res = defaultdict(list) with conn: cur = conn.cursor() + if table not in tables_with_units: + raise ValueError(f'Unexpected table name: {table}') cur.execute(f'SELECT units FROM {table}')Please confirm no external/user input reaches
table. If it does, we should harden further (quote identifiers or map to known names). Based on learningstemoa/temoa_model/unit_checking/relation_checker.py (2)
170-176: Prefer logging.exception to capture traceback on SQL failure.- except sqlite3.OperationalError: - logger.error('failed to process query: %s when processing table %s', query, table_name) + except sqlite3.OperationalError: + logger.exception('failed to process query: %s when processing table %s', query, table_name)Also consider asserting
table_namefrom an allowlist to mitigate S608. Based on hints
250-256: Prefer logging.exception on query failure.- except sqlite3.OperationalError: - logger.error( + except sqlite3.OperationalError: + logger.exception( 'failed to process query: %s when processing table %s', query, ct.table_name )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/source/Documentation.rst(1 hunks)temoa/extensions/modeling_to_generate_alternatives/mga_sequencer.py(3 hunks)temoa/extensions/myopic/myopic_sequencer.py(2 hunks)temoa/temoa_model/temoa_config.py(3 hunks)temoa/temoa_model/temoa_sequencer.py(4 hunks)temoa/temoa_model/unit_checking/__init__.py(1 hunks)temoa/temoa_model/unit_checking/common.py(1 hunks)temoa/temoa_model/unit_checking/entry_checker.py(1 hunks)temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)temoa/temoa_model/unit_checking/screener.py(1 hunks)temoa/temoa_model/unit_checking/table_checker.py(1 hunks)temoa/temoa_model/unit_checking/temoa_units.txt(1 hunks)tests/test_unit_checker.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
temoa/extensions/modeling_to_generate_alternatives/mga_sequencer.py (1)
temoa/temoa_model/unit_checking/screener.py (1)
screen(54-231)
temoa/temoa_model/unit_checking/table_checker.py (2)
temoa/temoa_model/unit_checking/common.py (1)
consolidate_lines(166-173)temoa/temoa_model/unit_checking/entry_checker.py (3)
validate_units_expression(64-74)validate_units_format(49-61)gather_from_table(77-89)
temoa/temoa_model/unit_checking/screener.py (2)
temoa/temoa_model/unit_checking/relation_checker.py (5)
check_efficiency_table(88-152)make_commodity_lut(53-65)check_inter_table_relations(155-225)check_cost_tables(228-341)make_c2a_lut(68-80)temoa/temoa_model/unit_checking/table_checker.py (1)
check_table(54-142)
temoa/temoa_model/unit_checking/entry_checker.py (1)
temoa/temoa_model/unit_checking/common.py (1)
UnitsFormat(144-146)
temoa/temoa_model/temoa_sequencer.py (1)
temoa/temoa_model/unit_checking/screener.py (1)
screen(54-231)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (1)
consolidate_lines(166-173)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
temoa/extensions/myopic/myopic_sequencer.py (1)
temoa/temoa_model/unit_checking/screener.py (1)
screen(54-231)
🪛 LanguageTool
temoa/temoa_model/unit_checking/temoa_units.txt
[grammar] ~1-~1: There might be a mistake here.
Context: ...ew additions needed to for completeness in Temoa model # the units here AUGMENT th...
(QB_NEW_EN)
[grammar] ~4-~4: There might be a mistake here.
Context: ...ion for more info. passenger = [person] seat = [object] vehicle = [object] # se...
(QB_NEW_EN)
[grammar] ~5-~5: There might be a mistake here.
Context: ...o. passenger = [person] seat = [object] vehicle = [object] # see pint's notes a...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...at = [object] vehicle = [object] # see pint's notes about currency and conversion bef...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...ition to USD. dollar = [currency] = USD euro = [currency] = EUR # Temoa uses et...
(QB_NEW_EN)
🪛 Ruff (0.14.0)
temoa/temoa_model/unit_checking/table_checker.py
141-141: Avoid specifying long messages outside the exception class
(TRY003)
temoa/temoa_model/unit_checking/screener.py
67-67: Avoid specifying long messages outside the exception class
(TRY003)
temoa/temoa_model/unit_checking/entry_checker.py
72-72: Consider moving this statement to an else block
(TRY300)
83-83: Possible SQL injection vector through string-based query construction
(S608)
temoa/temoa_model/unit_checking/relation_checker.py
163-165: Possible SQL injection vector through string-based query construction
(S608)
168-168: Possible SQL injection vector through string-based query construction
(S608)
173-173: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
246-249: Avoid specifying long messages outside the exception class
(TRY003)
250-250: Possible SQL injection vector through string-based query construction
(S608)
254-256: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/test_unit_checker.py
59-59: Unused function argument: expected
(ARG001)
🔇 Additional comments (1)
temoa/temoa_model/unit_checking/common.py (1)
166-173: LGTM: helpful line consolidation utility.Clear and constrained output length; used consistently.
| from mypy.checkexpr import defaultdict | ||
| from mypy.message_registry import NamedTuple | ||
| from pint.registry import Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong imports: using mypy internals breaks runtime.
Import from stdlib modules.
-from mypy.checkexpr import defaultdict
-from mypy.message_registry import NamedTuple
+from collections import defaultdict
+from typing import NamedTuple📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from mypy.checkexpr import defaultdict | |
| from mypy.message_registry import NamedTuple | |
| from pint.registry import Unit | |
| from collections import defaultdict | |
| from typing import NamedTuple | |
| from pint.registry import Unit |
🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/relation_checker.py around lines 34–36, the
code imports mypy internals and pint.registry which breaks runtime; replace
those with standard-library and public-package imports: use "from collections
import defaultdict", "from typing import NamedTuple", and import Unit from
pint's public API (e.g. "from pint.unit import Unit") so the module uses stable,
supported symbols at runtime.
| ref = {expr: converted_units[0] / converted_units[1]} | ||
| res.update(ref) | ||
| else: | ||
| raise ValueError('Unknown units format: %s', format_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect ValueError formatting; message not interpolated.
Use f-string; also satisfies TRY003 hint.
- raise ValueError('Unknown units format: %s', format_type)
+ raise ValueError(f'Unknown units format: {format_type}')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise ValueError('Unknown units format: %s', format_type) | |
| raise ValueError(f'Unknown units format: {format_type}') |
🧰 Tools
🪛 Ruff (0.14.0)
141-141: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/table_checker.py around line 141, the
ValueError is raised using comma-separated arguments which doesn't interpolate
the format_type; change it to raise ValueError(f"Unknown units format:
{format_type}") using an f-string (satisfies TRY003) so the message includes the
actual format_type value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
temoa/temoa_model/unit_checking/relation_checker.py (3)
36-36: Wrong Unit import (Pint public API).Import Unit from pint.unit, not pint.registry. Otherwise, runtime ImportError risk.
Apply:
-from pint.registry import Unit +from pint.unit import Unit
116-129: Avoid KeyError when commodity missing; improve diagnostics.Use .get() and handle unknown commodities in messages.
- invalid_input_flag = input_units != comm_units[ic] - invalid_output_flag = output_units != comm_units[oc] + expected_in = comm_units.get(ic) + expected_out = comm_units.get(oc) + invalid_input_flag = expected_in is None or input_units != expected_in + invalid_output_flag = expected_out is None or output_units != expected_out @@ - if invalid_input_flag: - msg += f'\n Invalid input units: {comm_units[ic]}' - if invalid_output_flag: - msg += f'\n Invalid output units: {comm_units[oc]}' + if invalid_input_flag: + msg += f'\n Invalid input units: {expected_in if expected_in is not None else "<unknown commodity>"}' + if invalid_output_flag: + msg += f'\n Invalid output units: {expected_out if expected_out is not None else "<unknown commodity>"}'
260-336: Cost table: wrong identifier used for lookups and messages.When commodity_reference is set, the first selected column is the commodity value, not the column name; also the loop var is misnamed. Fix lookups and labels; use the row value consistently. Adjust C2A tech key accordingly.
- for idx, (tech, raw_units_expression) in enumerate(rows, start=1): + for idx, (ref, raw_units_expression) in enumerate(rows, start=1): @@ - if ct.commodity_reference: - commodity_units = commodity_lut.get(ct.commodity_reference) + if ct.commodity_reference: + # `ref` is the commodity value from the selected column + commodity_units = commodity_lut.get(ref) if not commodity_units: - label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' + label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ref}) ' table_grouped_errors[label].append(idx) continue else: - tech_io = tech_lut.get(tech) + tech_io = tech_lut.get(ref) if tech_io: commodity_units = tech_io.output_units else: - label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' + label = f'{ct.table_name}: Unprocessed row (unknown tech: {ref}) ' table_grouped_errors[label].append(idx) continue @@ - if ct.capacity_based: - c2a_units = c2a_lut.get(tech, ureg.dimensionless) # default is dimensionless + if ct.capacity_based: + c2a_units = c2a_lut.get(ref, ureg.dimensionless) # default is dimensionless # apply to convert measure_units *= c2a_units * ureg.year @@ - if not matched: - tech_reference = ct.commodity_reference if ct.commodity_reference else tech + if not matched: + tech_reference = ref label = ( f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/commodity {tech_reference}: {raw_units_expression}' f'\n Commodity units: {commodity_units}, Discovered (after conversions): {measure_units}' f'\n Conversions: c2a units: {c2a_units*ureg.year if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' )
🧹 Nitpick comments (6)
temoa/temoa_model/unit_checking/relation_checker.py (6)
88-91: Fix type hint for commodity units.These are pint Units, not strings.
-def check_efficiency_table( - conn: sqlite3.Connection, comm_units: dict[str, str] -) -> tuple[dict[str, IOUnits], list[str]]: +def check_efficiency_table( + conn: sqlite3.Connection, comm_units: dict[str, Unit] +) -> tuple[dict[str, IOUnits], list[str]]:
163-169: Dynamic SQL identifiers: validate/whitelist to mitigate injection (S608).table_name and identifiers should be validated against a known set (schema constants) before interpolation. Add an allowlist check.
Based on hints
Example helper (add once, then use before building the query):
def assert_known_identifier(name: str, allowed: set[str]) -> str: if name not in allowed: raise ValueError(f'Unknown identifier: {name}') return nameEnsure table_name comes from an allowlist (e.g., schema constants) and not user input, then keep the f-string.
173-175: Prefer logger.exception to preserve traceback.Use exception() in except blocks.
Based on hints
- logger.error('failed to process query: %s when processing table %s', query, table_name) + logger.exception('Failed to process query: %s when processing table %s', query, table_name)
250-250: Dynamic SQL identifiers: validate/whitelist (S608).ct.table_name and ct.commodity_reference must be validated (allowlist) before interpolation.
Based on hints
Validate identifiers from CostTableData construction (known schema), then use the f-string safely. If user-controlled, reject early.
254-256: Preserve traceback on DB errors.Switch to logger.exception for better debugging.
Based on hints
- logger.error( + logger.exception( 'failed to process query: %s when processing table %s', query, ct.table_name )
75-80: Variable naming nit: use tech for C2A LUT clarity.The first column is tech; rename local var for readability.
- for comm, units in rows: + for tech, units in rows: valid, group = validate_units_format(units, SINGLE_ELEMENT) if valid: valid, units = validate_units_expression(group[0]) - res[comm] = units + res[tech] = units
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (1)
consolidate_lines(166-173)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
🪛 Ruff (0.14.0)
temoa/temoa_model/unit_checking/relation_checker.py
163-165: Possible SQL injection vector through string-based query construction
(S608)
168-168: Possible SQL injection vector through string-based query construction
(S608)
173-173: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
246-249: Avoid specifying long messages outside the exception class
(TRY003)
250-250: Possible SQL injection vector through string-based query construction
(S608)
254-256: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
temoa/utilities/db_migration_v3_to_v3_1.py (1)
193-199: Fix severity label: "WARNING" should be "ERROR".Past review indicated this mismatch label should be corrected. Since the script closes connections and exits, this is a fatal error, not a warning. The same pattern at lines 231-237 correctly uses "ERROR".
Apply this diff for consistency:
- print(f'WARNING: Column mismatch in {table_name}') + print(f'ERROR: Column mismatch in {table_name}')
🧹 Nitpick comments (2)
temoa/utilities/db_migration_v3_to_v3_1.py (2)
178-178: Use positive exit codes for portability.The script exits with
-1on errors. POSIX conventions expect positive exit codes (1-255), and-1may be interpreted as 255 on some systems. For consistency with standard practice, usesys.exit(1)for error conditions.Apply this diff:
- sys.exit(-1) + sys.exit(1)This same pattern appears in lines 199, 237, and should be updated throughout for consistency.
272-272: Prefer list unpacking over concatenation.The static analyzer suggests using unpacking syntax for better readability.
Apply this diff:
- if set(old_cols + ['units']) != set(new_cols): + if set([*old_cols, 'units']) != set(new_cols):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
temoa/utilities/db_migration_v3_to_v3_1.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.0)
temoa/utilities/db_migration_v3_to_v3_1.py
203-203: Possible SQL injection vector through string-based query construction
(S608)
216-216: Possible SQL injection vector through string-based query construction
(S608)
245-245: Possible SQL injection vector through string-based query construction
(S608)
255-255: Possible SQL injection vector through string-based query construction
(S608)
272-272: Consider [*old_cols, 'units'] instead of concatenation
Replace with [*old_cols, 'units']
(RUF005)
283-283: Possible SQL injection vector through string-based query construction
(S608)
295-295: Possible SQL injection vector through string-based query construction
(S608)
🔇 Additional comments (1)
temoa/utilities/db_migration_v3_to_v3_1.py (1)
300-319: LGTM! Solid finalization sequence.The commit, vacuum, FK re-enabling, and FK verification sequence is well-structured. The error handling for FK checks provides clear diagnostics, and connection cleanup is properly done.
…e log setup when importing IamDataFrame, causing failure of Temoa logging.
… in relation checker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
temoa/utilities/db_migration_v3_to_v3_1.py (1)
208-212: Inconsistent severity label: fatal condition labeled as WARNING.Line 209 prints "WARNING" but then calls
exit_on_failure, which is fatal. This is inconsistent with line 246, which correctly labels the same condition as "ERROR" in the modified-transfers block.Apply this diff:
if set(old_cols) != set(new_cols): - print(f'WARNING: Column mismatch in {table_name}') + print(f'ERROR: Column mismatch in {table_name}') print(f'Old columns: {old_cols}') print(f'New columns: {new_cols}') exit_on_failure(f'Column mismatch in {table_name}')temoa/temoa_model/unit_checking/screener.py (1)
81-83: DB version check uses raw metadata; cast to int or it may fail at runtime.MetaData values are strings; current comparisons can raise TypeError or always fail. Cast safely.
Apply:
- major = meta_data.get('DB_MAJOR', 0) - minor = meta_data.get('DB_MINOR', 0) + try: + major = int(meta_data.get('DB_MAJOR', '0')) + minor = int(meta_data.get('DB_MINOR', '0')) + except (TypeError, ValueError): + major, minor = 0, 0temoa/temoa_model/unit_checking/relation_checker.py (3)
116-129: Guard commodity lookups; avoid KeyError and clarify messages.- invalid_input_flag = input_units != comm_units[ic] - invalid_output_flag = output_units != comm_units[oc] + expected_in = comm_units.get(ic) + expected_out = comm_units.get(oc) + invalid_input_flag = expected_in is None or input_units != expected_in + invalid_output_flag = expected_out is None or output_units != expected_out @@ - if invalid_input_flag: - msg += f'\n Invalid input units: {comm_units[ic]}' - if invalid_output_flag: - msg += f'\n Invalid output units: {comm_units[oc]}' + if invalid_input_flag: + msg += f'\n Invalid input units: {expected_in if expected_in is not None else "<unknown commodity>"}' + if invalid_output_flag: + msg += f'\n Invalid output units: {expected_out if expected_out is not None else "<unknown commodity>"}'
256-341: Cost table lookups use the column name, not the row value; causes false errors.Use the selected field’s row value for lookups and C2A conversion.
- query = f'SELECT {ct.commodity_reference if ct.commodity_reference else 'tech'}, units FROM {ct.table_name}' + query = f'SELECT {ct.commodity_reference if ct.commodity_reference else "tech"}, units FROM {ct.table_name}' @@ - for idx, (tech, raw_units_expression) in enumerate(rows, start=1): + for idx, (ref, raw_units_expression) in enumerate(rows, start=1): @@ - if ct.commodity_reference: - commodity_units = commodity_lut.get(ct.commodity_reference) + if ct.commodity_reference: + # `ref` holds the commodity from the selected column + commodity_units = commodity_lut.get(ref) if not commodity_units: - label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' + label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ref}) ' table_grouped_errors[label].append(idx) continue else: - tech_io = tech_lut.get(tech) + tech_io = tech_lut.get(ref) if tech_io: commodity_units = tech_io.output_units else: - label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' + label = f'{ct.table_name}: Unprocessed row (unknown tech: {ref}) ' table_grouped_errors[label].append(idx) continue @@ - if ct.capacity_based: - c2a_units = c2a_lut.get(tech, ureg.dimensionless) # default is dimensionless + if ct.capacity_based: + c2a_units = c2a_lut.get(ref, ureg.dimensionless) # default is dimensionless # apply to convert measure_units *= c2a_units * ureg.year @@ - if not matched: - tech_reference = ct.commodity_reference if ct.commodity_reference else tech + if not matched: + tech_reference = ref label = ( f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/commodity {tech_reference}: {raw_units_expression}' f'\n Commodity units: {commodity_units}, Discovered (after conversions): {measure_units}' f'\n Conversions: c2a units: {c2a_units*ureg.year if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' )
36-36: Use Pint’s publicUnittype, not internalpint.registry.
pint.registry.Unitis internal and may change; preferpint.unit.Unit(compatible with Pint ≥0.25).Apply:
-from pint.registry import Unit +from pint.unit import UnitBased on learnings.
🧹 Nitpick comments (10)
requirements.in (1)
46-47: Consider adding a minimum version constraint for pint.Pint 0.25.0 requires Python >= 3.11, which aligns with your Python 3.12 environment. However, adding an explicit minimum version constraint would ensure reproducible builds and prevent issues if the unit-checking code relies on features from newer Pint versions.
Based on learnings
Apply this diff to add a minimum version constraint:
# for database v3.1 -pint +pint >= 0.23Note: Adjust the minimum version (e.g., 0.23, 0.24, or 0.25) based on which Pint features your unit-checking code actually requires.
temoa/utilities/db_migration_v3_to_v3_1.py (3)
150-152: Use standard Unix exit code.Exit code -1 is non-standard; use 1 to indicate a general error per Unix conventions.
Apply this diff:
if new_db_path.exists(): print(f'ERROR: destination database already exists: {new_db_path}. Exiting.') - sys.exit(-1) + sys.exit(1)
155-163: Standardize exit code in helper function.The exit code should be 1 for consistency with Unix conventions.
Apply this diff:
def exit_on_failure(msg): print( 'Transition failed for reason below. This issue is fatal and must be remediated. Exiting.' ) print(msg) con_old.close() con_new.close() new_db_path.unlink() - sys.exit(-1) + sys.exit(1)
284-284: Optional: Prefer unpacking syntax for list concatenation.Ruff suggests using the unpacking operator for combining lists.
Apply this diff:
- if set(old_cols + ['units']) != set(new_cols): + if set([*old_cols, 'units']) != set(new_cols):temoa/temoa_model/unit_checking/screener.py (3)
90-97: Don’t return inside the per-DB loop on version failure; process all DBs.Keep scanning remaining DBs, mark overall failure, and write one report at the end.
- msg = 'Units Check 1 (DB Version): Failed. DB must be v3.1 or greater for units checking' + msg = 'Units Check 1 (DB Version): Failed. DB must be v3.1 or greater for units checking' report_entries.extend((msg, '\n')) logger.warning(msg) - # we are non-viable, write the (very short) report and return - _write_report(report_path, report_entries) - if verbose: - print(msg) - return False + if verbose: + print(msg) + all_clear = False + report_entries.append('\n') + logger.info('Skipping further checks for this DB due to version mismatch.') + continue
34-34: Avoid importing project-only symbols at module import time.Move
definitionsimport into the__main__block to keep this module importable as a library.-from definitions import PROJECT_ROOT @@ -if __name__ == '__main__': - db_path = Path(PROJECT_ROOT) / 'data_files/mike_US/US_9R_8D_v3_stability_v3_1.sqlite' - screen(db_path, report_path=Path(PROJECT_ROOT) / 'output_files/') +if __name__ == '__main__': + from definitions import PROJECT_ROOT + db_path = Path(PROJECT_ROOT) / 'data_files/mike_US/US_9R_8D_v3_stability_v3_1.sqlite' + screen(db_path, report_path=Path(PROJECT_ROOT) / 'output_files/')Also applies to: 245-247
63-65: Remove unusedtable_unitsaccumulator.It’s never read; drop it and the stray string literal.
- table_units = {} - """Table name : {tech | commodity: units}""" @@ - relations, table_errors = check_table(conn, table) - table_units[table] = relations + relations, table_errors = check_table(conn, table)Also applies to: 109-111
temoa/temoa_model/unit_checking/relation_checker.py (3)
174-176: Preferlogger.exceptionto capture stack traces.Improves diagnosability when queries fail.
- except sqlite3.OperationalError: - logger.error('failed to process query: %s when processing table %s', query, table_name) + except sqlite3.OperationalError: + logger.exception('failed to process query: %s when processing table %s', query, table_name)- except sqlite3.OperationalError: - logger.error( - 'failed to process query: %s when processing table %s', query, ct.table_name - ) + except sqlite3.OperationalError: + logger.exception( + 'failed to process query: %s when processing table %s', query, ct.table_name + )Also applies to: 260-262
203-205: Remove f-strings with no placeholders (F541).- if not valid_table_units: - grouped_errors[f'Unprocessed row (invalid units--see earlier tests)'].append(idx) - if not c2a_valid: - grouped_errors[f'Unprocessed row (invalid c2a units--see earlier tests)'].append(idx) + if not valid_table_units: + grouped_errors['Unprocessed row (invalid units--see earlier tests)'].append(idx) + if not c2a_valid: + grouped_errors['Unprocessed row (invalid c2a units--see earlier tests)'].append(idx)
160-170: Defend against SQL injection by whitelisting table names.
table_nameis interpolated into SQL; assert it’s from a trusted list.def check_inter_table_relations( conn: sqlite3.Connection, table_name, tech_lut: dict[str, IOUnits], capacity_based: bool ) -> list[str]: """check the tech and units in the given table vs. baseline (expected) values for the tech""" - grouped_errors = defaultdict(list) + grouped_errors = defaultdict(list) + # Defensive: ensure only known tables are queried + from temoa.temoa_model.unit_checking.common import activity_based_tables, capacity_based_tables + allowed = set(activity_based_tables) | set(capacity_based_tables) + if table_name not in allowed: + return [f'Unexpected table name: {table_name}']If
table_namealready guaranteed by the caller, feel free to skip.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
requirements.in(2 hunks)requirements.txt(14 hunks)temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)temoa/temoa_model/unit_checking/screener.py(1 hunks)temoa/utilities/db_migration_v3_to_v3_1.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
temoa/temoa_model/unit_checking/screener.py (2)
temoa/temoa_model/unit_checking/relation_checker.py (5)
check_efficiency_table(89-153)make_commodity_lut(53-65)check_inter_table_relations(156-231)check_cost_tables(234-345)make_c2a_lut(68-80)temoa/temoa_model/unit_checking/table_checker.py (1)
check_table(54-142)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (1)
consolidate_lines(166-173)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
🪛 Ruff (0.14.0)
temoa/temoa_model/unit_checking/screener.py
67-67: Avoid specifying long messages outside the exception class
(TRY003)
temoa/temoa_model/unit_checking/relation_checker.py
164-166: Possible SQL injection vector through string-based query construction
(S608)
169-169: Possible SQL injection vector through string-based query construction
(S608)
174-174: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
203-203: f-string without any placeholders
Remove extraneous f prefix
(F541)
205-205: f-string without any placeholders
Remove extraneous f prefix
(F541)
252-255: Avoid specifying long messages outside the exception class
(TRY003)
256-256: Possible SQL injection vector through string-based query construction
(S608)
260-262: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
temoa/utilities/db_migration_v3_to_v3_1.py
216-216: Possible SQL injection vector through string-based query construction
(S608)
229-229: Possible SQL injection vector through string-based query construction
(S608)
257-257: Possible SQL injection vector through string-based query construction
(S608)
267-267: Possible SQL injection vector through string-based query construction
(S608)
284-284: Consider [*old_cols, 'units'] instead of concatenation
Replace with [*old_cols, 'units']
(RUF005)
295-295: Possible SQL injection vector through string-based query construction
(S608)
307-307: Possible SQL injection vector through string-based query construction
(S608)
🔇 Additional comments (6)
requirements.in (1)
25-25: Version 3.1.0 of pyam-iamc exists, and no known security vulnerabilities were found. The version constraint is valid.temoa/utilities/db_migration_v3_to_v3_1.py (3)
216-216: Static analysis SQL injection warnings are false positives.Ruff flags these lines as potential SQL injection vectors, but the construction is safe because:
table_namecomes exclusively from hardcoded lists (direct_transfer_tables,transfer_with_mod.keys(),add_units_tables)cols_stris built from column names retrieved viaPRAGMA table_info, which reflects the actual schema structure- No user-controlled input enters these queries
The parameterized WHERE clause at line 257 correctly handles user-like data and was addressed in previous reviews.
Also applies to: 229-229, 257-257, 267-267, 295-295, 307-307
246-249: LGTM: Correct error labeling and parameterized query.The error label correctly indicates a fatal condition (line 246), and the parameterized WHERE clause (lines 254-259) properly prevents SQL injection by separating the query structure from values.
Also applies to: 254-259
312-340: LGTM: Comprehensive post-migration validation.The final sequence properly commits the transaction, vacuums the database, re-enables foreign key enforcement, validates FK integrity, and performs a sanity check on the TimePeriod table. The error handling for FK check failures is appropriate.
requirements.txt (2)
2-5: Python version consistency issue resolved.Both
requirements.inandrequirements.txtnow consistently specify Python 3.12. The past mismatch (3.11 vs 3.12) has been corrected. pint==0.25 is present (line 349) and compatible with Python 3.12.
349-357: Verify that your project supports Python 3.11+; pint 0.25's specific breaking changes do not affect this codebase.Pint 0.25 introduces a minimum Python requirement of 3.11+ and fixes the
round()and.to()return types. However, analysis of the codebase shows:
- No
round()or.to()method calls on pint quantities exist in the four unit-checking modules- Quantity operations use only safe APIs: multiplication/division (
*,/),parse_units(), andcompatible_units()- Tests exist for unit validation (created 9/22/25)
Confirm your project's Python version requirement is already 3.11+ to match pint 0.25's minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
temoa/temoa_model/unit_checking/relation_checker.py (4)
90-93: Fixcomm_unitstype annotation to actual contents (Unit).-def check_efficiency_table( - conn: sqlite3.Connection, comm_units: dict[str, str] +def check_efficiency_table( + conn: sqlite3.Connection, comm_units: dict[str, Unit] ) -> tuple[dict[str, IOUnits], list[str]]:
117-131: Avoid KeyError when commodities are missing; improve messages.- invalid_input_flag = input_units != comm_units[ic] - invalid_output_flag = output_units != comm_units[oc] + expected_in = comm_units.get(ic) + expected_out = comm_units.get(oc) + invalid_input_flag = expected_in is None or input_units != expected_in + invalid_output_flag = expected_out is None or output_units != expected_out @@ - if invalid_input_flag: - msg += f'\n Invalid input units: {comm_units[ic]}' - if invalid_output_flag: - msg += f'\n Invalid output units: {comm_units[oc]}' + if invalid_input_flag: + msg += f'\n Invalid input units: {expected_in if expected_in is not None else "<unknown commodity>"}' + if invalid_output_flag: + msg += f'\n Invalid output units: {expected_out if expected_out is not None else "<unknown commodity>"}'
290-375: Cost-table checks use column name instead of row value; fix identifier handling throughout.The first selected column is the row’s identifier (commodity or tech). Use it for lookups, C2A, and messages.
- query = f'SELECT {ct.commodity_reference if ct.commodity_reference else 'tech'}, units FROM {ct.table_name}' + query = f"SELECT {ct.commodity_reference if ct.commodity_reference else 'tech'}, units FROM {ct.table_name}" @@ - for idx, (tech, raw_units_expression) in enumerate(rows, start=1): + for idx, (ref, raw_units_expression) in enumerate(rows, start=1): @@ - if ct.commodity_reference: - commodity_units = commodity_lut.get(ct.commodity_reference) + if ct.commodity_reference: + # `ref` is the commodity value from the selected column + commodity_units = commodity_lut.get(ref) if not commodity_units: - label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' + label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ref}) ' table_grouped_errors[label].append(idx) continue else: - tech_io = tech_lut.get(tech) + tech_io = tech_lut.get(ref) if tech_io: commodity_units = tech_io.output_units else: - label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' + label = f'{ct.table_name}: Unprocessed row (unknown tech: {ref}) ' table_grouped_errors[label].append(idx) continue @@ - if ct.capacity_based: - c2a_units = c2a_lut.get(tech, ureg.dimensionless) # default is dimensionless + if ct.capacity_based: + c2a_units = c2a_lut.get(ref, ureg.dimensionless) # default is dimensionless # apply to convert measure_units *= c2a_units * ureg.year @@ - if not matched: - tech_reference = ct.commodity_reference if ct.commodity_reference else tech + if not matched: + tech_reference = ref label = ( f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/commodity {tech_reference}: {raw_units_expression}' - f'\n Expecting commodity units: {commodity_units}. Discovered (after conversions applied): {measure_units}' + f'\n Expecting commodity units: {commodity_units}. Discovered (after conversions applied): {measure_units}' f'\n Conversions: c2a units: {c2a_units*ureg.year if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' )
36-36: ImportUnitfrom Pint public API.
pint.registry.Unitisn’t a stable public path. Usepint.unit.Unit.-from pint.registry import Unit +from pint.unit import Unittemoa/temoa_model/unit_checking/screener.py (1)
83-99: DB version check compares strings to ints; always fails on string values.Cast MetaData values to int before comparison to avoid false failures.
- major = meta_data.get('DB_MAJOR', 0) - minor = meta_data.get('DB_MINOR', 0) - if major == 3 and minor >= 1: + try: + major = int(meta_data.get('DB_MAJOR', '0')) + minor = int(meta_data.get('DB_MINOR', '0')) + except (TypeError, ValueError): + major, minor = 0, 0 + if major == 3 and minor >= 1:
🧹 Nitpick comments (7)
temoa/temoa_model/unit_checking/common.py (1)
155-161: Clarify allowed characters in comments to match the regex.The comment lists only “*” and “_”, but the patterns also allow “^, /, (), digits, spaces”. Update the comment for accuracy.
Also applies to: 169-169
temoa/temoa_model/unit_checking/screener.py (3)
34-34: Avoid importing PROJECT_ROOT at module import; move under main.Prevents hard dependency when importing this module as a library.
-from definitions import PROJECT_ROOT @@ -if __name__ == '__main__': - db_path = Path(PROJECT_ROOT) / 'data_files/mike_US/US_9R_8D_v3_stability_v3_1.sqlite' - screen(db_path, report_path=Path(PROJECT_ROOT) / 'temp/') +if __name__ == '__main__': + from definitions import PROJECT_ROOT # localize to script use + db_path = Path(PROJECT_ROOT) / 'data_files/mike_US/US_9R_8D_v3_stability_v3_1.sqlite' + screen(db_path, report_path=Path(PROJECT_ROOT) / 'temp/')Also applies to: 270-272
65-66: Reset per‑DB state and minor readability nits.
- Reinitialize table_units inside the per‑DB loop; current dict persists across DBs.
- Optional: track a per‑DB errors flag to avoid shadowing “errors”.
- table_units = {} - """Table name : {tech | commodity: units}""" - for db_path in db_paths: + for db_path in db_paths: + table_units = {} # Table name : {tech | commodity: units}Also applies to: 67-73, 101-129
259-267: Ensure report directory exists; support directory paths consistently.If report_path is a directory (possibly not yet created) or a file in a non‑existent directory, writing will fail. Create parents as needed.
def _write_report(report_path: Path, report_entries: list[str]): """Write the report to file""" if not report_path: return - if report_path.is_dir(): - # augment with a default filename - report_path /= 'units_check.txt' + if report_path.suffix == '': + # treat as directory path; create and append default filename + report_path.mkdir(parents=True, exist_ok=True) + report_path = report_path / 'units_check.txt' + else: + report_path.parent.mkdir(parents=True, exist_ok=True) with open(report_path, 'w', encoding='utf-8') as report_file: report_file.writelines(report_entries)temoa/temoa_model/unit_checking/relation_checker.py (3)
183-185: Log the exception with traceback for failed queries.Use
logger.exceptionto capture context.- except sqlite3.OperationalError: - logger.error('failed to process query: %s when processing table %s', query, table_name) + except sqlite3.OperationalError: + logger.exception('failed to process query: %s when processing table %s', query, table_name)
170-179: Dynamic SQL table names are safe-by-construction; add a whitelist assert to satisfy S608.Since
table_name/ct.table_nameoriginate from curated sets, assert membership before execution.# before building the query ALLOWED_TABLES = {'CapacityToActivity', 'commodity', 'efficiency', 'CostInvest', 'CostEmission', 'CostFixed', 'CostVariable'} assert table_name in ALLOWED_TABLES # or for cost_tables: assert ct.table_name in {t.table_name for t in cost_tables}Also applies to: 290-293
251-257: Message wording for COMMODITY relations.For COMMODITY checks, “for output of tech …” is misleading. Consider neutral phrasing.
- f' for output of tech {tech_or_comm}' + f' for reference {tech_or_comm}'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
temoa/temoa_model/unit_checking/common.py(1 hunks)temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)temoa/temoa_model/unit_checking/screener.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
temoa/temoa_model/unit_checking/screener.py (3)
temoa/temoa_model/unit_checking/common.py (1)
RelationType(143-146)temoa/temoa_model/unit_checking/relation_checker.py (5)
check_efficiency_table(90-154)make_commodity_lut(54-66)check_inter_table_relations(157-265)check_cost_tables(268-379)make_c2a_lut(69-81)temoa/temoa_model/unit_checking/table_checker.py (1)
check_table(54-142)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (2)
consolidate_lines(172-179)RelationType(143-146)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
🪛 Ruff (0.14.1)
temoa/temoa_model/unit_checking/screener.py
69-69: Avoid specifying long messages outside the exception class
(TRY003)
temoa/temoa_model/unit_checking/relation_checker.py
170-172: Possible SQL injection vector through string-based query construction
(S608)
175-175: Possible SQL injection vector through string-based query construction
(S608)
177-177: Possible SQL injection vector through string-based query construction
(S608)
179-179: Avoid specifying long messages outside the exception class
(TRY003)
183-183: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
210-210: Avoid specifying long messages outside the exception class
(TRY003)
286-289: Avoid specifying long messages outside the exception class
(TRY003)
290-290: Possible SQL injection vector through string-based query construction
(S608)
294-296: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| RATIO_ELEMENT = UnitsFormat( | ||
| format=r'^\s*([A-Za-z0-9\*\/\^\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$', | ||
| groups=2, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RATIO regex blocks parentheses in numerator; contradicts comments and likely valid inputs.
Expressions like "(kg*m)/(s)" won’t match because the numerator character class omits "()". Allow parentheses in the numerator too.
RATIO_ELEMENT = UnitsFormat(
- format=r'^\s*([A-Za-z0-9\*\/\^\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$',
+ format=r'^\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$',
groups=2,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RATIO_ELEMENT = UnitsFormat( | |
| format=r'^\s*([A-Za-z0-9\*\/\^\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$', | |
| groups=2, | |
| ) | |
| RATIO_ELEMENT = UnitsFormat( | |
| format=r'^\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$', | |
| groups=2, | |
| ) |
🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/common.py around lines 163 to 166, the
RATIO_ELEMENT regex currently excludes parentheses from the numerator character
class which blocks valid inputs like "(kg*m)/(s)"; update the numerator
character class to allow parentheses (i.e., include '\(' and '\)' inside the
first bracketed class in the raw regex string) so the pattern accepts
parenthesized numerators while preserving the rest of the pattern and groups.
| io_units = tech_lut[tech_or_comm] | ||
| if not io_units: | ||
| grouped_errors[ | ||
| f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' | ||
| ].append(idx) | ||
| continue | ||
| expected_units = io_units.output_units |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyError risk: use .get() for ACTIVITY lookups.
Indexing tech_lut[tech_or_comm] will raise if missing; use .get() as in the CAPACITY case.
- case RelationType.ACTIVITY:
- io_units = tech_lut[tech_or_comm]
- if not io_units:
+ case RelationType.ACTIVITY:
+ io_units = tech_lut.get(tech_or_comm)
+ if not io_units:
grouped_errors[
f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)'
].append(idx)
continue
expected_units = io_units.output_units🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/relation_checker.py around lines 200 to 206,
the code directly indexes tech_lut[tech_or_comm] which can raise KeyError for
missing ACTIVITY entries; change the lookup to use tech_lut.get(tech_or_comm)
(like the CAPACITY case), check if the result is falsy, append the same
grouped_errors message and continue, and only access expected_units =
io_units.output_units when io_units is present.
…nit checker is and how to execute with it. Update to 2 config files to include unit checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
README.md (1)
67-76: Improve phrasing to eliminate split infinitive and reduce redundancy.Lines 68–69 contain a split infinitive ("are advised to carefully review") and redundant phrasing ("Users who use this tool"). Static analysis caught the split infinitive concern. The current wording can be tightened for clarity.
Consider this revision:
-Converting legacy db's to Version 3.0 can be done with the included database migration tool. Users who use this tool are advised to carefully review the console outputs during conversion to ensure accuracy and check the -converted database carefully. The migration tool will build an empty new Version 3.0 database and move data from +Converting legacy databases to Version 3.0 can be done with the included database migration tool. We advise reviewing console outputs carefully during conversion to ensure accuracy and checking the +converted database carefully. The migration tool will build an empty new Version 3.0 database and move data from(Minor touches: "db's" → "databases", "Users who use this tool are advised to" → "We advise", moved "carefully" after the verb, and light punctuation adjustments for consistency.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(3 hunks)data_files/my_configs/config_sample.toml(1 hunks)data_files/my_configs/morris_utopia.toml(1 hunks)docs/source/Documentation.rst(1 hunks)temoa/temoa_model/temoa_sequencer.py(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- data_files/my_configs/morris_utopia.toml
🚧 Files skipped from review as they are similar to previous changes (2)
- temoa/temoa_model/temoa_sequencer.py
- docs/source/Documentation.rst
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~68-~68: Try moving the adverb to make the sentence clearer.
Context: ...tion tool. Users who use this tool are advised to carefully review the console outputs during conversion to ensure acc...
(SPLIT_INFINITIVE)
🔇 Additional comments (3)
data_files/my_configs/config_sample.toml (1)
41-43: Unit check option is well-documented and properly placed.The new
unit_checkconfiguration option is clearly commented and positioned logically within the DATA/MODEL CHECKS section. The explanatory text effectively conveys its purpose and output location.README.md (2)
81-84: Unit Checking documentation is clear and well-integrated.The new section on unit checking functionality is well-written, correctly describes the v3.1 migration requirement, and clearly explains that the v3.1 utility expects v3.0 source databases. This aligns seamlessly with the new configuration option and PR objectives.
98-98: Config table entry for Unit Checking is clear and consistent.The new row accurately describes the unit checking feature and maintains consistency with other entries in the table and the corresponding configuration option in
config_sample.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (6)
temoa/temoa_model/unit_checking/common.py (1)
163-166: Allow parentheses in RATIO numerator.Valid inputs like “(kg*m)/(s)” should pass.
RATIO_ELEMENT = UnitsFormat( - format=r'^\s*([A-Za-z0-9\*\/\^\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$', + format=r'^\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\/\s*\(\s*([A-Za-z0-9\*\^\/\(\)\_\s]+?)\s*\)\s*$', groups=2, )temoa/temoa_model/unit_checking/relation_checker.py (4)
199-206: Use.get()for ACTIVITY tech lookups to prevent KeyError.- case RelationType.ACTIVITY: - io_units = tech_lut[tech_or_comm] - if not io_units: + case RelationType.ACTIVITY: + io_units = tech_lut.get(tech_or_comm) + if not io_units: grouped_errors[ f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' ].append(idx) continue expected_units = io_units.output_units
117-131: Avoid KeyError on missing commodities; improve messages.Use
.get()for lookups and handle unknowns in output.- invalid_input_flag = input_units != comm_units[ic] - invalid_output_flag = output_units != comm_units[oc] + expected_in = comm_units.get(ic) + expected_out = comm_units.get(oc) + invalid_input_flag = expected_in is None or input_units != expected_in + invalid_output_flag = expected_out is None or output_units != expected_out @@ - if invalid_input_flag: - msg += f'\n Invalid input units: {comm_units[ic]}' - if invalid_output_flag: - msg += f'\n Invalid output units: {comm_units[oc]}' + if invalid_input_flag: + msg += f'\n Invalid input units: {expected_in if expected_in is not None else "<unknown commodity>"}' + if invalid_output_flag: + msg += f'\n Invalid output units: {expected_out if expected_out is not None else "<unknown commodity>"}'
90-92: Correct type annotation forcomm_units.It maps commodity name to
Unit, notstr.-def check_efficiency_table( - conn: sqlite3.Connection, comm_units: dict[str, str] +def check_efficiency_table( + conn: sqlite3.Connection, comm_units: dict[str, Unit] ) -> tuple[dict[str, IOUnits], list[str]]:
290-375: Cost tables: use row identifier for lookups; fix c2a key and messages.When
commodity_referenceis set, the first selected column is the commodity value, not the column name. Also use the same row identifier for C2A when capacity‑based.- for idx, (tech, raw_units_expression) in enumerate(rows, start=1): + for idx, (ref, raw_units_expression) in enumerate(rows, start=1): @@ - if ct.commodity_reference: - commodity_units = commodity_lut.get(ct.commodity_reference) + if ct.commodity_reference: + # `ref` is the commodity value from the selected column + commodity_units = commodity_lut.get(ref) if not commodity_units: - label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' + label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ref}) ' table_grouped_errors[label].append(idx) continue else: - tech_io = tech_lut.get(tech) + tech_io = tech_lut.get(ref) if tech_io: commodity_units = tech_io.output_units else: - label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' + label = f'{ct.table_name}: Unprocessed row (unknown tech: {ref}) ' table_grouped_errors[label].append(idx) continue @@ - if ct.capacity_based: - c2a_units = c2a_lut.get(tech, ureg.dimensionless) # default is dimensionless + if ct.capacity_based: + c2a_units = c2a_lut.get(ref, ureg.dimensionless) # default is dimensionless # apply to convert measure_units *= c2a_units * ureg.year @@ - if not matched: - tech_reference = ct.commodity_reference if ct.commodity_reference else tech + if not matched: + tech_reference = ref label = ( f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/commodity {tech_reference}: {raw_units_expression}' f'\n Expecting commodity units: {commodity_units}. Discovered (after conversions applied): {measure_units}' - f'\n Conversions: c2a units: {c2a_units if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' + f'\n Conversions: c2a units: {c2a_units if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' )temoa/temoa_model/unit_checking/screener.py (1)
81-99: Fix: DB version values are strings; cast before comparison.Current comparison always risks false negatives. Parse MetaData values to int with a safe fallback.
- data = conn.execute('SELECT element, value FROM MetaData').fetchall() - meta_data = dict(data) - major = meta_data.get('DB_MAJOR', 0) - minor = meta_data.get('DB_MINOR', 0) - if major == 3 and minor >= 1: + data = conn.execute('SELECT element, value FROM MetaData').fetchall() + meta_data = dict(data) + try: + major = int(meta_data.get('DB_MAJOR', '0')) + minor = int(meta_data.get('DB_MINOR', '0')) + except (TypeError, ValueError): + major, minor = 0, 0 + if major == 3 and minor >= 1: msg = 'Units Check 1 (DB Version): Passed' report_entries.extend((msg, '\n')) logger.info(msg) if verbose: print(msg) else: msg = 'Units Check 1 (DB Version): Failed. DB must be v3.1 or greater for units checking' report_entries.extend((msg, '\n')) logger.warning(msg) - # we are non-viable, write the (very short) report and return - _write_report(report_path, report_entries) + # we are non-viable for this DB; continue to process others (if any) if verbose: print(msg) - return False + all_clear = False + continueIf you prefer hard‑fail on first invalid DB, keep the early return but still cast.
🧹 Nitpick comments (4)
temoa/temoa_model/unit_checking/screener.py (2)
34-35: Defer PROJECT_ROOT import to main guard.Prevents import-time dependency for library consumers.
-from definitions import PROJECT_ROOT @@ -if __name__ == '__main__': - db_path = Path(PROJECT_ROOT) / '<path to db from project ROOT>' +if __name__ == '__main__': + from definitions import PROJECT_ROOT + db_path = Path(PROJECT_ROOT) / '<path to db from project ROOT>' screen(db_path, report_path=Path(PROJECT_ROOT) / 'temp')Also applies to: 275-278
65-66: Remove unusedtable_unitsaggregation (dead state).
table_unitsis populated but never read. Drop it to reduce memory and noise.- table_units = {} - """Table name : {tech | commodity: units}""" + # Removed unused table_units aggregation @@ - relations, table_errors = check_table(conn, table) - table_units[table] = relations + _, table_errors = check_table(conn, table)Also applies to: 110-120
temoa/temoa_model/unit_checking/relation_checker.py (2)
181-185: Preferlogger.exceptionto capture trace on SQL errors.- except sqlite3.OperationalError: - logger.error('failed to process query: %s when processing table %s', query, table_name) + except sqlite3.OperationalError: + logger.exception('failed to process query: %s when processing table %s', query, table_name)
1-10: Optional: validatetable_nameagainst a whitelist to mitigate S608.Since identifiers are interpolated, assert membership in known sets to avoid abuse and quiet linters.
-from temoa.temoa_model.unit_checking.common import ( +from temoa.temoa_model.unit_checking.common import ( RATIO_ELEMENT, SINGLE_ELEMENT, consolidate_lines, CostTableData, RelationType, + activity_based_tables, + capacity_based_tables, + commodity_based_tables, ) @@ def check_inter_table_relations( @@ ) -> list[str]: """check the tech and units in the given table vs. baseline (expected) values for the tech""" + allowed = set(activity_based_tables + capacity_based_tables + commodity_based_tables) + if table_name not in allowed: + raise ValueError(f'Unexpected table: {table_name}')Also applies to: 150-178
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
temoa/temoa_model/unit_checking/common.py(1 hunks)temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)temoa/temoa_model/unit_checking/screener.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
temoa/temoa_model/unit_checking/screener.py (3)
temoa/temoa_model/unit_checking/common.py (1)
RelationType(143-146)temoa/temoa_model/unit_checking/relation_checker.py (5)
check_efficiency_table(90-154)make_commodity_lut(54-66)check_inter_table_relations(157-265)check_cost_tables(268-379)make_c2a_lut(69-81)temoa/temoa_model/unit_checking/table_checker.py (1)
check_table(54-142)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (2)
consolidate_lines(172-179)RelationType(143-146)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
🪛 Ruff (0.14.1)
temoa/temoa_model/unit_checking/screener.py
69-69: Avoid specifying long messages outside the exception class
(TRY003)
temoa/temoa_model/unit_checking/relation_checker.py
170-172: Possible SQL injection vector through string-based query construction
(S608)
175-175: Possible SQL injection vector through string-based query construction
(S608)
177-177: Possible SQL injection vector through string-based query construction
(S608)
179-179: Avoid specifying long messages outside the exception class
(TRY003)
183-183: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
210-210: Avoid specifying long messages outside the exception class
(TRY003)
286-289: Avoid specifying long messages outside the exception class
(TRY003)
290-290: Possible SQL injection vector through string-based query construction
(S608)
294-296: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (1)
temoa/temoa_model/unit_checking/common.py (1)
125-131: FixCostTableDatadefinition; current NamedTuple call is invalid.Use class-syntax or functional tuple-of-tuples syntax.
-CostTableData = NamedTuple( - 'CostTableData', - table_name=str, - commodity_reference=str | None, - capacity_based=bool, - period_based=bool, -) +class CostTableData(NamedTuple): + table_name: str + commodity_reference: str | None + capacity_based: bool + period_based: boolLikely an incorrect or invalid review comment.
| """ | ||
| Tools for Energy Model Optimization and Analysis (Temoa): | ||
| An open source framework for energy systems optimization modeling | ||
|
|
||
| Copyright (C) 2015, NC State University | ||
|
|
||
| This program is free software; you can redistribute it and/or modify | ||
| it under the terms of the GNU General Public License as published by | ||
| the Free Software Foundation; either version 2 of the License, or | ||
| (at your option) any later version. | ||
|
|
||
| This program is distributed in the hope that it will be useful, | ||
| but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| GNU General Public License for more details. | ||
|
|
||
| A complete copy of the GNU General Public License v2 (GPLv2) is available | ||
| in LICENSE.txt. Users uncompressing this from an archive may not have | ||
| received this license file. If not, see <http://www.gnu.org/licenses/>. | ||
|
|
||
|
|
||
| Written by: J. F. Hyink | ||
| jeff@westernspark.us | ||
| https://westernspark.us | ||
| Created on: 9/22/25 | ||
|
|
||
| A systematic check of expected relationships between tables to ensure units are consistent | ||
|
|
||
| """ | ||
| import dataclasses | ||
| import logging | ||
| import sqlite3 | ||
| from collections import defaultdict | ||
| from collections.abc import Iterable | ||
|
|
||
| from pint.registry import Unit | ||
|
|
||
| from temoa.temoa_model.unit_checking import ureg | ||
| from temoa.temoa_model.unit_checking.common import ( | ||
| RATIO_ELEMENT, | ||
| SINGLE_ELEMENT, | ||
| consolidate_lines, | ||
| CostTableData, | ||
| RelationType, | ||
| ) | ||
| from temoa.temoa_model.unit_checking.entry_checker import ( | ||
| validate_units_format, | ||
| validate_units_expression, | ||
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def make_commodity_lut(conn: sqlite3.Connection) -> dict[str, Unit]: | ||
| """Get a dictionary of the units for each commodity entry""" | ||
| res = {} | ||
| cursor = conn.cursor() | ||
| query = 'SELECT name, units FROM commodity' | ||
| cursor.execute(query) | ||
| rows = cursor.fetchall() | ||
| for comm, units in rows: | ||
| valid, group = validate_units_format(units, SINGLE_ELEMENT) | ||
| if valid: | ||
| valid, units = validate_units_expression(group[0]) | ||
| res[comm] = units | ||
| return res | ||
|
|
||
|
|
||
| def make_c2a_lut(conn: sqlite3.Connection) -> dict[str, Unit]: | ||
| """Get a dictionary of the units for each capacity to activity entry""" | ||
| res = {} | ||
| cursor = conn.cursor() | ||
| query = 'SELECT tech, units FROM CapacityToActivity' | ||
| cursor.execute(query) | ||
| rows = cursor.fetchall() | ||
| for comm, units in rows: | ||
| valid, group = validate_units_format(units, SINGLE_ELEMENT) | ||
| if valid: | ||
| valid, units = validate_units_expression(group[0]) | ||
| res[comm] = units | ||
| return res | ||
|
|
||
|
|
||
| @dataclasses.dataclass(frozen=True) | ||
| class IOUnits: | ||
| input_units: Unit | ||
| output_units: Unit | ||
|
|
||
|
|
||
| def check_efficiency_table( | ||
| conn: sqlite3.Connection, comm_units: dict[str, str] | ||
| ) -> tuple[dict[str, IOUnits], list[str]]: | ||
| """ | ||
| Check the technology units for Efficiency table entries | ||
|
|
||
| Returns a dictionary of technology : IOUnits and a list of error messages | ||
|
|
||
| """ | ||
|
|
||
| query = 'SELECT tech, input_comm, output_comm, units FROM efficiency' | ||
| rows = conn.execute(query).fetchall() | ||
| res = {} | ||
| error_msgs = [] | ||
| invalid_rows = [] | ||
| for idx, (tech, ic, oc, units) in enumerate(rows, start=1): | ||
| input_units, output_units = None, None | ||
| valid, located_units = validate_units_format(units, RATIO_ELEMENT) | ||
| if valid: | ||
| valid, output_units = validate_units_expression(located_units[0]) | ||
| if valid: | ||
| valid, input_units = validate_units_expression(located_units[1]) | ||
| if not valid: | ||
| invalid_rows.append(idx) | ||
| # we give up early. The specifics of why this failed should be evident in earlier tests | ||
| continue | ||
|
|
||
| # check that our tech matches the units of the connected commodities | ||
| invalid_input_flag = input_units != comm_units[ic] | ||
| invalid_output_flag = output_units != comm_units[oc] | ||
| if invalid_input_flag or invalid_output_flag: | ||
| logger.warning( | ||
| 'Efficiency units conflict with associated commodity for Technology %s near row %d', | ||
| tech, | ||
| idx, | ||
| ) | ||
| msg = f"\n Expected: {f'{ic} [{input_units}]' :^25} ----> {tech :^20} ----> {f'{oc} [{output_units}]': ^25}" | ||
| if invalid_input_flag: | ||
| msg += f'\n Invalid input units: {comm_units[ic]}' | ||
| if invalid_output_flag: | ||
| msg += f'\n Invalid output units: {comm_units[oc]}' | ||
| error_msgs.append(msg) | ||
|
|
||
| # check that the output of this technology is consistent in units with other instances of same tech | ||
| if tech in res: | ||
| if res[tech].output_units != output_units: | ||
| logger.warning( | ||
| 'Efficiency units conflict with same-name tech for Technology %s near row %d', | ||
| tech, | ||
| idx, | ||
| ) | ||
| msg = f"\n Found: {f'{ic} [{input_units}]' :^25} ----> {tech :^20} ----> {f'{oc} [{output_units}]': ^25}" | ||
| msg += f'\n Conflicting output units: {res[tech].output_units} vs {output_units}' | ||
| error_msgs.append(msg) | ||
|
|
||
| else: | ||
| res[tech] = IOUnits(input_units, output_units) | ||
|
|
||
| # we gather all non-processed rows in one statement here due to size of table vs. individual reporting | ||
| if invalid_rows: | ||
| listed_lines = consolidate_lines(invalid_rows) | ||
| line_error_msg = f'Non-processed rows (see earlier tests): {listed_lines}' | ||
| error_msgs.append(line_error_msg) | ||
|
|
||
| return res, error_msgs | ||
|
|
||
|
|
||
| def check_inter_table_relations( | ||
| conn: sqlite3.Connection, | ||
| table_name, | ||
| tech_lut: dict[str, IOUnits], | ||
| comm_lut: dict[str, Unit], | ||
| relation_type: RelationType, | ||
| ) -> list[str]: | ||
| """check the tech and units in the given table vs. baseline (expected) values for the tech""" | ||
| grouped_errors = defaultdict(list) | ||
| match relation_type: | ||
| case RelationType.CAPACITY: | ||
| # we make a query to join on the C2A units to pick those up | ||
| query = ( | ||
| f'SELECT {table_name}.tech, {table_name}.units, ca.units ' | ||
| f'FROM {table_name} JOIN CapacityToActivity ca ' | ||
| f'ON {table_name}.tech = ca.tech AND {table_name}.region = ca.region' | ||
| ) | ||
| case RelationType.ACTIVITY: | ||
| query = f'SELECT tech, units, NULL FROM {table_name}' | ||
| case RelationType.COMMODITY: | ||
| query = f'SELECT commodity, units, NULL FROM {table_name}' | ||
| case _: | ||
| raise ValueError(f'Unexpected relation type: {relation_type}') | ||
| try: | ||
| rows = conn.execute(query).fetchall() | ||
| except sqlite3.OperationalError: | ||
| logger.error('failed to process query: %s when processing table %s', query, table_name) | ||
| msg = f'Failed to process table {table_name}. See log for failed query.' | ||
| return [msg] | ||
|
|
||
| # process the rows | ||
| for idx, (tech_or_comm, table_units, c2a_units) in enumerate(rows, start=1): | ||
| expected_units = None | ||
| match relation_type: | ||
| case RelationType.CAPACITY: | ||
| io_units = tech_lut.get(tech_or_comm) | ||
| if not io_units: | ||
| grouped_errors[ | ||
| f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' | ||
| ].append(idx) | ||
| continue | ||
| expected_units = io_units.output_units | ||
| case RelationType.ACTIVITY: | ||
| io_units = tech_lut[tech_or_comm] | ||
| if not io_units: | ||
| grouped_errors[ | ||
| f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' | ||
| ].append(idx) | ||
| continue | ||
| expected_units = io_units.output_units | ||
| case RelationType.COMMODITY: | ||
| expected_units = comm_lut.get(tech_or_comm) | ||
| case _: | ||
| raise ValueError(f'Unexpected relation type: {relation_type}') | ||
| if not expected_units: | ||
| grouped_errors[ | ||
| f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' | ||
| ].append(idx) | ||
| continue | ||
|
|
||
| # validate the units in the table... | ||
| entry_format_valid, units_data = validate_units_format(table_units, SINGLE_ELEMENT) | ||
| if entry_format_valid: | ||
| _, valid_table_units = validate_units_expression(units_data[0]) | ||
| else: | ||
| valid_table_units = None | ||
|
|
||
| # validate the c2a units, if needed | ||
| if c2a_units: | ||
| c2a_valid, units_data = validate_units_format(c2a_units, SINGLE_ELEMENT) | ||
| if c2a_valid: | ||
| # further ensure the conversion is valid and retain the validity | ||
| c2a_valid, valid_c2a_units = validate_units_expression(units_data[0]) | ||
| else: | ||
| valid_c2a_units = None | ||
| else: # we are in a valid state: no C2A units provided/needed | ||
| c2a_valid = True | ||
| valid_c2a_units = None | ||
|
|
||
| if not valid_table_units: | ||
| grouped_errors['Unprocessed row (invalid units--see earlier tests)'].append(idx) | ||
| if not c2a_valid: | ||
| grouped_errors['Unprocessed row (invalid c2a units--see earlier tests)'].append(idx) | ||
| if not valid_table_units or not c2a_valid: | ||
| continue | ||
|
|
||
| # if we have valid c2a units, combine them to get the units of activity | ||
| if valid_c2a_units: | ||
| res_units = valid_table_units * (valid_c2a_units * ureg.year) | ||
|
|
||
| else: | ||
| res_units = valid_table_units | ||
|
|
||
| # check that the res_units match the expectation from the tech | ||
| if expected_units != res_units: | ||
| msg = ( | ||
| f'Units mismatch from expected reference. Table Entry: {valid_table_units}, ' | ||
| f'{f" C2A Entry: {valid_c2a_units}, " if valid_c2a_units else ""}' | ||
| f'expected: {expected_units}' | ||
| f' for output of tech {tech_or_comm}' | ||
| ) | ||
| grouped_errors[msg].append(idx) | ||
|
|
||
| # gather into list format | ||
| res = [] | ||
| for msg, line_nums in grouped_errors.items(): | ||
| res.append(f'{msg} at rows: {consolidate_lines(line_nums)}') | ||
|
|
||
| return res | ||
|
|
||
|
|
||
| def check_cost_tables( | ||
| conn: sqlite3.Connection, | ||
| cost_tables: Iterable[CostTableData], | ||
| tech_lut: dict[str, IOUnits], | ||
| c2a_lut: dict[str, Unit], | ||
| commodity_lut: dict[str, Unit], | ||
| ) -> list[str]: | ||
| """ | ||
| Check all cost tables for (a) alignment of units to tech output (the denominator) and (b) 100% commonality | ||
| in the cost units (numerator) | ||
| Note: we'll *assume* the first passing entry in the first table establishes the common cost units and | ||
| check for consistency | ||
| """ | ||
| common_cost_unit = None # Expectation: MUSD. Something with a prefix and currency dimension | ||
| error_msgs = [] | ||
| for ct in cost_tables: | ||
| table_grouped_errors = defaultdict(list) | ||
| if ct.commodity_reference and ct.capacity_based: | ||
| raise ValueError( | ||
| f'Table that is "capacity based" {ct.table_name} flagged as ' | ||
| 'having commodity field--expecting tech field. Check data.' | ||
| ) | ||
| query = f'SELECT {ct.commodity_reference if ct.commodity_reference else 'tech'}, units FROM {ct.table_name}' | ||
| try: | ||
| rows = conn.execute(query).fetchall() | ||
| except sqlite3.OperationalError: | ||
| logger.error( | ||
| 'failed to process query: %s when processing table %s', query, ct.table_name | ||
| ) | ||
| msg = f'Failed to process table {ct.table_name}. See log for failed query.' | ||
| error_msgs.append(msg) | ||
| continue | ||
| for idx, (tech, raw_units_expression) in enumerate(rows, start=1): | ||
| # convert to pint expression | ||
| cost_units, measure_units = None, None | ||
| # screen for empty/missing raw inputs | ||
| if not raw_units_expression: | ||
| label = f'{ct.table_name}: Unprocessed row (missing units): {raw_units_expression}' | ||
| table_grouped_errors[label].append(idx) | ||
| continue | ||
| valid, (elements) = validate_units_format(raw_units_expression, RATIO_ELEMENT) | ||
| if valid: | ||
| cost_valid, cost_units = validate_units_expression(elements[0]) | ||
| units_valid, measure_units = validate_units_expression(elements[1]) | ||
| else: | ||
| cost_valid, units_valid = False, False | ||
| if not (cost_valid and units_valid): | ||
| label = f'{ct.table_name}: Unprocessed row (invalid units--see earlier tests): {raw_units_expression}' | ||
| table_grouped_errors[label].append(idx) | ||
| continue | ||
|
|
||
| # Test 1: Look for cost commonality | ||
| if common_cost_unit is None: | ||
| # try to establish it | ||
| # check that what we have captured is in the currency dimension == "clean" | ||
| # dev note: multiplying by 1 allows us to use the check_units_expression() function | ||
| if (1 * cost_units).check('[currency]'): | ||
| common_cost_unit = cost_units | ||
| else: | ||
| # something is wrong, hopefully it was just this entry? | ||
| # mark it, dump it, and try again... | ||
| error_msgs.append( | ||
| f'{ct.table_name}: Unprocessed row (unreducible cost units): {cost_units} at row: {idx}' | ||
| ) | ||
| continue | ||
| else: # use the common units to test | ||
| if cost_units != common_cost_unit: | ||
| label = f'{ct.table_name}: Non-standard cost found (expected common cost units of {common_cost_unit}) got: {cost_units}' | ||
| table_grouped_errors[label].append(idx) | ||
|
|
||
| # Test 2: Check the units of measure to ensure alignment with the tech's output units | ||
| # find the referenced commodity units from the tech or commodity depending on table structure... | ||
| if ct.commodity_reference: | ||
| commodity_units = commodity_lut.get(ct.commodity_reference) | ||
| if not commodity_units: | ||
| label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' | ||
| table_grouped_errors[label].append(idx) | ||
| continue | ||
| else: | ||
| tech_io = tech_lut.get(tech) | ||
| if tech_io: | ||
| commodity_units = tech_io.output_units | ||
| else: | ||
| label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' | ||
| table_grouped_errors[label].append(idx) | ||
| continue | ||
|
|
||
| # pull the C2A factor if this table is capacity-based and determine the "match units" which should | ||
| # match the commodity units in the table, after removing the "per period" time factor, if it exists | ||
| c2a_units = None | ||
| if ct.capacity_based: | ||
| c2a_units = c2a_lut.get(tech, ureg.dimensionless) # default is dimensionless | ||
| # apply to convert | ||
| measure_units *= c2a_units * ureg.year | ||
|
|
||
| if ct.period_based: | ||
| measure_units /= ureg.year # remove the "per year" from this element | ||
|
|
||
| matched = measure_units == commodity_units | ||
|
|
||
| if not matched: | ||
| tech_reference = ct.commodity_reference if ct.commodity_reference else tech | ||
| label = ( | ||
| f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/commodity {tech_reference}: {raw_units_expression}' | ||
| f'\n Expecting commodity units: {commodity_units}. Discovered (after conversions applied): {measure_units}' | ||
| f'\n Conversions: c2a units: {c2a_units if c2a_units else "N/A"}{", `per period` removed" if ct.period_based else ""}\n ' | ||
| ) | ||
| table_grouped_errors[label].append(idx) | ||
|
|
||
| for label, listed_lines in table_grouped_errors.items(): | ||
| error_msgs.append(f'{label} at rows: {consolidate_lines(listed_lines)}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Quick scan helper to verify fixes (lookups and annotations).
🏁 Script executed:
#!/bin/bash
# Fail on direct dict indexing that can KeyError and check annotation
rg -n -C1 --type=py $'comm_units\\[' temoa/temoa_model/unit_checking/relation_checker.py
rg -n -C1 --type=py $'tech_lut\\[' temoa/temoa_model/unit_checking/relation_checker.py
rg -n -C1 --type=py $'def\\s+check_efficiency_table\\(.*comm_units:\\s*dict\\[str,\\s*Unit\\]' temoa/temoa_model/unit_checking/relation_checker.pyLength of output: 671
Fix type annotation and unsafe direct dict indexing.
The code has three related issues:
-
Type annotation mismatch (line 80): Parameter
comm_units: dict[str, str]should bedict[str, Unit]sincemake_commodity_lut()returnsdict[str, Unit]and the function compares values againstUnitobjects. -
Unsafe direct indexing on
comm_units(lines 118–119, 128–130): Direct access viacomm_units[ic]andcomm_units[oc]can raiseKeyError. Use.get()with appropriate error handling instead, consistent with other defensive checks in the function. -
Inconsistent dict access pattern (line 200): In
check_inter_table_relations(), the ACTIVITY case uses direct indexingtech_lut[tech_or_comm]while the CAPACITY case (line 188) safely uses.get(). Standardize to.get().
🧰 Tools
🪛 Ruff (0.14.1)
170-172: Possible SQL injection vector through string-based query construction
(S608)
175-175: Possible SQL injection vector through string-based query construction
(S608)
177-177: Possible SQL injection vector through string-based query construction
(S608)
179-179: Avoid specifying long messages outside the exception class
(TRY003)
183-183: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
210-210: Avoid specifying long messages outside the exception class
(TRY003)
286-289: Avoid specifying long messages outside the exception class
(TRY003)
290-290: Possible SQL injection vector through string-based query construction
(S608)
294-296: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/relation_checker.py lines 1-378: change the
type annotation of check_efficiency_table's comm_units parameter from dict[str,
str] to dict[str, Unit]; replace all direct dict indexing that can KeyError with
safe .get() lookups and handle missing keys (specifically replace comm_units[ic]
and comm_units[oc] checks around lines ~118-130 with
comm_units.get(ic)/comm_units.get(oc) and add error handling/logging and
early-continue when a commodity is missing); and make the ACTIVITY branch in
check_inter_table_relations (around lines ~188-205) use
tech_lut.get(tech_or_comm) like the CAPACITY branch and handle the
missing-io_units case consistently by recording the unprocessed-row error and
continuing.
| f'SELECT {table_name}.tech, {table_name}.units, ca.units ' | ||
| f'FROM {table_name} JOIN CapacityToActivity ca ' | ||
| f'ON {table_name}.tech = ca.tech AND {table_name}.region = ca.region' | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t drop CAPACITY rows without C2A; LEFT JOIN and flag missing C2A.
Inner join hides unmatched rows. Use LEFT JOIN and treat missing C2A as invalid.
- query = (
- f'SELECT {table_name}.tech, {table_name}.units, ca.units '
- f'FROM {table_name} JOIN CapacityToActivity ca '
- f'ON {table_name}.tech = ca.tech AND {table_name}.region = ca.region'
- )
+ query = (
+ f'SELECT {table_name}.tech, {table_name}.units, ca.units '
+ f'FROM {table_name} LEFT JOIN CapacityToActivity ca '
+ f'ON {table_name}.tech = ca.tech AND {table_name}.region = ca.region'
+ )
@@
- if c2a_units:
+ if c2a_units:
c2a_valid, units_data = validate_units_format(c2a_units, SINGLE_ELEMENT)
if c2a_valid:
# further ensure the conversion is valid and retain the validity
c2a_valid, valid_c2a_units = validate_units_expression(units_data[0])
else:
valid_c2a_units = None
- else: # we are in a valid state: no C2A units provided/needed
- c2a_valid = True
- valid_c2a_units = None
+ else:
+ # Require C2A for CAPACITY; optional otherwise
+ c2a_valid = relation_type != RelationType.CAPACITY
+ valid_c2a_units = NoneAlso applies to: 224-235
🧰 Tools
🪛 Ruff (0.14.1)
170-172: Possible SQL injection vector through string-based query construction
(S608)
🤖 Prompt for AI Agents
In temoа/temoa_model/unit_checking/relation_checker.py around lines 170-173 (and
similarly around lines 224-235), the current INNER JOIN hides Capacity rows that
lack a CapacityToActivity (C2A) match; change the SQL to use a LEFT JOIN and
detect rows where ca.units (or the C2A key) is NULL, marking those Capacity rows
as invalid/missing C2A instead of silently dropping them. Update the query to
LEFT JOIN and add logic after query execution to flag/collect rows with NULL
ca.* values (or include a CASE/COALESCE expression) so missing C2A entries are
reported rather than excluded.
| def _write_report(report_path: Path, report_entries: list[str]): | ||
| """Write the report to file""" | ||
| if not report_path: | ||
| return | ||
| if not report_path.exists(): | ||
| report_path.mkdir(parents=True) | ||
| if report_path.is_dir(): | ||
| # augment with a default filename including timestamp | ||
| from datetime import datetime | ||
|
|
||
| timestamp = datetime.now().strftime('%Y-%m-%d_%H%M%S') | ||
| report_path /= f'units_check_{timestamp}.txt' | ||
| with open(report_path, 'w', encoding='utf-8') as report_file: | ||
| report_file.writelines(report_entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Harden report writer: distinguish file vs directory paths.
Avoid creating a directory when a file path is intended; ensure parents exist for file outputs.
def _write_report(report_path: Path, report_entries: list[str]):
"""Write the report to file"""
if not report_path:
return
- if not report_path.exists():
- report_path.mkdir(parents=True)
- if report_path.is_dir():
+ # If path points to a directory (existing or suffixless), write a timestamped file inside it.
+ if report_path.is_dir() or report_path.suffix == "":
+ if not report_path.exists():
+ report_path.mkdir(parents=True, exist_ok=True)
# augment with a default filename including timestamp
from datetime import datetime
timestamp = datetime.now().strftime('%Y-%m-%d_%H%M%S')
- report_path /= f'units_check_{timestamp}.txt'
- with open(report_path, 'w', encoding='utf-8') as report_file:
+ report_path = report_path / f'units_check_{timestamp}.txt'
+ else:
+ # Treat as file path; ensure parent exists
+ report_path.parent.mkdir(parents=True, exist_ok=True)
+ with open(report_path, 'w', encoding='utf-8') as report_file:
report_file.writelines(report_entries)🤖 Prompt for AI Agents
In temoa/temoa_model/unit_checking/screener.py around lines 259-272, the current
writer creates directories for the given report_path itself and can mistakenly
turn an intended file path into a directory; instead, only create parent
directories for a file and only append the default filename when the provided
path actually refers to a directory. Change the logic to: treat a falsy/None
path as no-op; if report_path.exists() and report_path.is_dir() (or report_path
explicitly provided as a directory), append the timestamped filename; otherwise
treat report_path as a file and ensure its parent directory exists by calling
report_path.parent.mkdir(parents=True, exist_ok=True) (do not call mkdir on
report_path itself); then open the final file path and write entries. Ensure you
use exist_ok=True when creating dirs and preserve UTF-8 writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
temoa/temoa_model/unit_checking/relation_checker.py (3)
200-207: Critical: Inconsistent dict access causes KeyError risk.The ACTIVITY case uses direct indexing
tech_lut[tech_or_comm]while the CAPACITY case (line 193) safely uses.get(). This inconsistency will raiseKeyErrorfor missing technologies.case RelationType.ACTIVITY: - io_units = tech_lut[tech_or_comm] + io_units = tech_lut.get(tech_or_comm) if not io_units: grouped_errors[ f'Unprocessed row (missing reference for tech "{tech_or_comm}" --see earlier tests)' ].append(idx) continue expected_units = io_units.output_units
90-131: Critical: Type annotation and KeyError risks remain unresolved.Despite past reviews, two critical issues persist:
Line 91: Type annotation is incorrect -
comm_unitsshould bedict[str, Unit]notdict[str, str], sincemake_commodity_lut()returnsdict[str, Unit]and values are compared againstUnitobjects.Lines 118-119, 128-130: Direct dict indexing
comm_units[ic]andcomm_units[oc]will raiseKeyErrorif commodities are missing. Use defensive.get()lookups instead:def check_efficiency_table( - conn: sqlite3.Connection, comm_units: dict[str, str] + conn: sqlite3.Connection, comm_units: dict[str, Unit] ) -> tuple[dict[str, IOUnits], list[str]]:# check that our tech matches the units of the connected commodities - invalid_input_flag = input_units != comm_units[ic] - invalid_output_flag = output_units != comm_units[oc] + expected_in = comm_units.get(ic) + expected_out = comm_units.get(oc) + if expected_in is None or expected_out is None: + logger.warning( + 'Missing commodity unit for Technology %s near row %d (input: %s, output: %s)', + tech, idx, ic if expected_in is None else '', oc if expected_out is None else '' + ) + invalid_rows.append(idx) + continue + invalid_input_flag = input_units != expected_in + invalid_output_flag = output_units != expected_out if invalid_input_flag or invalid_output_flag: logger.warning( 'Efficiency units conflict with associated commodity for Technology %s near row %d', tech, idx, ) msg = f"\n Expected: {f'{ic} [{input_units}]' :^25} ----> {tech :^20} ----> {f'{oc} [{output_units}]': ^25}" if invalid_input_flag: - msg += f'\n Invalid input units: {comm_units[ic]}' + msg += f'\n Invalid input units: {expected_in}' if invalid_output_flag: - msg += f'\n Invalid output units: {comm_units[oc]}' + msg += f'\n Invalid output units: {expected_out}' error_msgs.append(msg)
320-410: Critical: Incorrect commodity/tech identifier usage in cost table validation.When
ct.commodity_referenceis set, the query selects the commodity column (e.g., 'emis_comm'), but the code incorrectly looks upcommodity_lut.get(ct.commodity_reference)(the column name) instead of the actual row value. This causes false "unknown commodity" errors.Issues:
- Line 320: Variable named
techis misleading when it holds a commodity value- Line 361: Looks up
ct.commodity_reference(column name) instead of row value- Lines 380, 393: Use
techvariable which may actually be a commodity- for idx, (tech, raw_units_expression) in enumerate(rows, start=1): + for idx, (identifier, raw_units_expression) in enumerate(rows, start=1): # convert to pint expression cost_units, measure_units = None, None # screen for empty/missing raw inputs if not raw_units_expression: - label = f'{ct.table_name}: Unprocessed row (missing units): {raw_units_expression}' + label = f'{ct.table_name}: Unprocessed row (missing units)' table_grouped_errors[label].append(idx) continue valid, (elements) = validate_units_format(raw_units_expression, RATIO_ELEMENT) if valid: cost_valid, cost_units = validate_units_expression(elements[0]) units_valid, measure_units = validate_units_expression(elements[1]) else: cost_valid, units_valid = False, False if not (cost_valid and units_valid): label = f'{ct.table_name}: Unprocessed row (invalid units--see earlier tests): {raw_units_expression}' table_grouped_errors[label].append(idx) continue # Test 1: Look for cost commonality if common_cost_unit is None: # try to establish it # check that what we have captured is in the currency dimension == "clean" # dev note: multiplying by 1 allows us to use the check_units_expression() function if (1 * cost_units).check('[currency]'): common_cost_unit = cost_units else: # something is wrong, hopefully it was just this entry? # mark it, dump it, and try again... error_msgs.append( f'{ct.table_name}: Unprocessed row (unreducible cost units): {cost_units} at row: {idx}' ) continue else: # use the common units to test if cost_units != common_cost_unit: label = f'{ct.table_name}: Non-standard cost found (expected common cost units of {common_cost_unit}) got: {cost_units}' table_grouped_errors[label].append(idx) # Test 2: Check the units of measure to ensure alignment with the tech's output units # find the referenced commodity units from the tech or commodity depending on table structure... if ct.commodity_reference: - commodity_units = commodity_lut.get(ct.commodity_reference) + # identifier holds the actual commodity value from the selected column + commodity_units = commodity_lut.get(identifier) if not commodity_units: - label = f'{ct.table_name}: Unprocessed row (unknown commodity: {ct.commodity_reference}) ' + label = f'{ct.table_name}: Unprocessed row (unknown commodity: {identifier}) ' table_grouped_errors[label].append(idx) continue else: - tech_io = tech_lut.get(tech) + # identifier holds the tech value + tech_io = tech_lut.get(identifier) if tech_io: commodity_units = tech_io.output_units else: - label = f'{ct.table_name}: Unprocessed row (unknown tech: {tech}) ' + label = f'{ct.table_name}: Unprocessed row (unknown tech: {identifier}) ' table_grouped_errors[label].append(idx) continue # pull the C2A factor if this table is capacity-based and determine the "match units" which should # match the commodity units in the table, after removing the "per period" time factor, if it exists c2a_units = None - oring_measure_units = measure_units + original_measure_units = measure_units if ct.capacity_based: - c2a_units = c2a_lut.get(tech, '<none>') # default is dimensionless + c2a_units = c2a_lut.get(identifier, ureg.dimensionless) # apply to convert - if c2a_units != '<none>': - measure_units *= c2a_units + measure_units *= c2a_units # apply the nominal period measure_units *= ureg.year if ct.period_based: measure_units /= ureg.year # remove the "per year" from this element matched = measure_units == commodity_units if not matched: - tech_reference = ct.commodity_reference if ct.commodity_reference else tech - label = f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/comm: {tech_reference}:' + label = f'{ct.table_name}: Non-matching measure unit found in cost denominator for tech/comm: {identifier}:' conversions = [] if ct.capacity_based: conversions.append(f'C2A Factor: {c2a_units}') conversions.append(f'Nominal Period: {ureg.year}') if ct.period_based: conversions.append(f'Per-period Factor: {ureg.year}') detail = _ding_label( table_entry=raw_units_expression, - focus=f'Converted Measure in Denominator: {oring_measure_units}', + focus=f'Converted Measure in Denominator: {original_measure_units}', conversions=conversions, result=measure_units, expectation=commodity_units, ) label += f'{detail}\n' table_grouped_errors[label].append(idx)Also fixed: typo
oring_measure_units→original_measure_unitsand simplified thec2a_unitsdefault handling.
🧹 Nitpick comments (1)
temoa/temoa_model/unit_checking/relation_checker.py (1)
169-173: Consider LEFT JOIN to detect missing CapacityToActivity entries.The current INNER JOIN silently drops CAPACITY rows that lack a C2A match. Using LEFT JOIN would allow detection and reporting of these missing entries as validation errors rather than silently ignoring them.
query = ( f'SELECT {table_name}.tech, {table_name}.units, ca.units ' - f'FROM {table_name} JOIN CapacityToActivity ca ' + f'FROM {table_name} LEFT JOIN CapacityToActivity ca ' f'ON {table_name}.tech = ca.tech AND {table_name}.region = ca.region' )Then update the validation logic (around lines 233-235) to flag missing C2A as an error for CAPACITY relations:
else: # we are in a valid state: no C2A units provided/needed - c2a_valid = True + c2a_valid = relation_type != RelationType.CAPACITY valid_c2a_units = None + if not c2a_valid: + grouped_errors['Unprocessed row (missing CapacityToActivity entry)'].append(idx)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/source/Documentation.rst(1 hunks)temoa/temoa_model/unit_checking/relation_checker.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-30T14:37:44.217Z
Learnt from: idelder
PR: TemoaProject/temoa#180
File: temoa/components/limits.py:632-647
Timestamp: 2025-10-30T14:37:44.217Z
Learning: In the `limit_tech_input_split_average_constraint` function in `temoa/components/limits.py`, there was a bug where the `total_inp` summation incorrectly used the fixed variable `i` instead of the loop variable `S_i` when indexing `processOutputsByInput` and calling `get_variable_efficiency`. This pattern should be watched for in similar aggregation constraints: when summing over a loop variable like `S_i`, all references within that loop should use `S_i`, not a fixed outer variable.
Applied to files:
temoa/temoa_model/unit_checking/relation_checker.py
🧬 Code graph analysis (1)
temoa/temoa_model/unit_checking/relation_checker.py (2)
temoa/temoa_model/unit_checking/common.py (2)
consolidate_lines(172-179)RelationType(143-146)temoa/temoa_model/unit_checking/entry_checker.py (2)
validate_units_format(49-61)validate_units_expression(64-74)
🪛 Ruff (0.14.2)
temoa/temoa_model/unit_checking/relation_checker.py
170-172: Possible SQL injection vector through string-based query construction
(S608)
176-176: Possible SQL injection vector through string-based query construction
(S608)
178-178: Possible SQL injection vector through string-based query construction
(S608)
180-180: Avoid specifying long messages outside the exception class
(TRY003)
184-184: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
306-309: Avoid specifying long messages outside the exception class
(TRY003)
310-310: Possible SQL injection vector through string-based query construction
(S608)
314-316: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (1)
docs/source/Documentation.rst (1)
411-542: LGTM! Comprehensive documentation of the new unit checking feature.The documentation clearly explains:
- The upgrade path to database v3.1 with unit checking
- How Pint is used for validation (not in model calculations)
- Unit expression formats and testing procedures
- The five-step validation sequence
- A detailed table showing which tables have units checking
The technical content aligns well with the implementation in the codebase.
|
Closing since #232 is merged now |
This PR sets some groundwork for version 3.1 of the DB and includes a unit checking module
pintas a requirement for future use with unitsAlso included is the basic Unit Checking package in
temoa/temoa_model/unit_checkingUnit checks can be run on a DB by:
Summary by CodeRabbit