ParquetWriter fails when merging files

**Describe the bug**
When performing merging files with the `ParquetWriter` there seems to be an inconsistency with how the dataframe index (`event_no`) is handled.

I think this is because the `ParquetWriter` is writing the `event_no` column and setting it to the index of the dataframe with pandas. Then later when using `merge_files()`, when reading the writer outputs, polars expects `event_no` to be a normal column. I think simply not setting it as the index in the writer is enough to solve the issue, but I'm not sure if that would have downstream affects.

**To Reproduce**
This minimal code should reproduce the error, run it from within the main graphnet folder:

```py
from pathlib import Path
from graphnet.data.dataconverter import DataConverter
from graphnet.data.extractors.prometheus import (
    PrometheusFeatureExtractor,
    PrometheusTruthExtractor,
)
from graphnet.data.readers import PrometheusReader
from graphnet.data.writers import ParquetWriter


root = Path("./data/tests/prometheus")

outdir = root / "out"
if not outdir.exists():
    outdir.mkdir()

converter = DataConverter(
    file_reader=PrometheusReader(),
    save_method=ParquetWriter(truth_table="mc_truth"),
    extractors=[PrometheusTruthExtractor(), PrometheusFeatureExtractor()],
    outdir=str(outdir),
    num_workers=1,
)
converter(input_dir=str(root))

# fails here
converter.merge_files()
```

**Expected behavior**
The final graphnet-ready output is produced without error

**Full traceback**
Please include the full error message to allow for debugging
```
graphnet [MainProcess] INFO     2026-03-13 14:49:02 - PrometheusReader.__init__ - Writing log to logs
graphnet [MainProcess] INFO     2026-03-13 14:49:03 - DataConverter.<module> - Merging files to .../out/merged
Traceback (most recent call last):
  File "./graphnet/./reproduce_parquet_index_column_bug.py", line 28, in <module>
    converter.merge_files()
  File "./graphnet/src/graphnet/data/dataconverter.py", line 389, in merge_files
    self._save_method.merge_files(
  File "./graphnet/src/graphnet/data/writers/parquet_writer.py", line 100, in merge_files
    truth_meta = self._identify_events(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "./graphnet/src/graphnet/data/writers/parquet_writer.py", line 152, in _identify_events
    df.select([index_column]),
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./graphnet/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py", line 10307, in select
    .collect(optimizations=QueryOptFlags._eager())
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./graphnet/.venv/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./graphnet/.venv/lib/python3.11/site-packages/polars/lazyframe/opt_flags.py", line 326, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./graphnet/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2440, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ColumnNotFoundError: unable to find column "event_no"; valid columns: ["interaction", "initial_state_energy", "initial_state_type", "initial_state_zenith", "initial_state_azimuth", "initial_state_x", "initial_state_y", "initial_state_z"]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParquetWriter fails when merging files #871

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ParquetWriter fails when merging files #871

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions