Skip to content

[Essnmx] Use time bin width instead of number of bin edges.#264

Open
YooSunYoung wants to merge 8 commits intomainfrom
essnmx-bin-width
Open

[Essnmx] Use time bin width instead of number of bin edges.#264
YooSunYoung wants to merge 8 commits intomainfrom
essnmx-bin-width

Conversation

@YooSunYoung
Copy link
Copy Markdown
Member

@YooSunYoung YooSunYoung commented Mar 24, 2026

Requested by @aaronfinke .

Now you can set --time-bin-width instead of nbins.
I didn't remove the nbins option so you can still use it, but you need to set the --time-bin-width 0.

Command Example

essnmx-reduce --input-file nmx.hdf --time-bin-width 3 --time-bin-unit ms
# Or
essnmx-reduce --input-file nmx.hdf --time-bin-width 0 --nbins 50

In this case below, the nbins will be ignored and default time-bin-width of 3[ms] will be used.

essnmx-reduce --input-file nmx.hdf --nbins 50

Command Helper

essnmx-reduce -h

Workflow Configuration:
  --time-bin-coordinate {event_time_offset,time_of_flight}
                        Coordinate to bin the time data. Selecting `event_time_offset` means reduction steps are skipped, i.e. calculating `time of flight(tof)` and simply saves histograms of the raw data. (default: time_of_flight)
  --time-bin-width TIME_BIN_WIDTH
                        Width(Length) of each Time Bin in [time_bin_unit]. If `time_bin_width` and `nbins` are both given, `time_bin_width` will be preferred. Set it to `0` if you want to use `nbins` instead. (default: 3)
  --nbins NBINS         Number of Time bins. If `bin_width` is given, `nbins` will be ignored. (default: 50)

@aaronfinke
Copy link
Copy Markdown
Contributor

--time-bin-width and --nbins should be mutually exclusive options (e.g. use ArgumentParser.add_mutually_exclusive_group() )

The essnmx-reduce output should include the number of generated bins and also (this would be nice) number of events per bin.

@YooSunYoung
Copy link
Copy Markdown
Member Author

The failing tests seem to be related to the resource issue....?? It doesn't fail on my workstation... I'll have a look at it tmr...

@YooSunYoung
Copy link
Copy Markdown
Member Author

YooSunYoung commented Mar 24, 2026

@aaronfinke

--time-bin-width and --nbins should be mutually exclusive options (e.g. use ArgumentParser.add_mutually_exclusive_group() )

Okay...!

The essnmx-reduce output should include the number of generated bins and also (this would be nice) number of events per bin.

You mean the logging/printing them out?
Or you want an additional 1-D histogram in the output file?

@aaronfinke
Copy link
Copy Markdown
Contributor

@YooSunYoung

You mean the logging/printing them out?
Or you want an additional 1-D histogram in the output file?

Certainly as logger output, yes, but also a cute 1D histogram would also be useful. dials.find_spots does something similar in its output:

Histogram of per-image spot count for imageset 0:
1879 spots found on 49 images (max 78 / bin)
                     *                           
                     *                           
                    **                          *
               **   ***              *          *
      *   ** * *** ******   * * **   *          *
      ****** ************* ** * **   ***        *
     ******* ****************** **   ***   **** *
******************************* **** *** ********
*************************************************
*************************************************
1                    image                     49

@aaronfinke
Copy link
Copy Markdown
Contributor

DIALS code for making that histogram:

https://github.com/dials/dials/blob/main/src/dials/util/ascii_art.py

@YooSunYoung
Copy link
Copy Markdown
Member Author

YooSunYoung commented Mar 25, 2026

@aaronfinke
Now it's not allowed to give both nbins and time-bin-width. I'll add the 1D histogram too (#267).

pixi run essnmx-reduce --nbins 3 --time-bin-width 3
usage: essnmx-reduce [-h] --input-file INPUT_FILE [INPUT_FILE ...] [--swmr] [--detector-ids DETECTOR_IDS [DETECTOR_IDS ...]] [--iter-chunk] [--chunk-size-pulse CHUNK_SIZE_PULSE] [--chunk-size-events CHUNK_SIZE_EVENTS]
                     [--time-bin-coordinate {event_time_offset,time_of_flight}] [--min-time-bin MIN_TIME_BIN] [--max-time-bin MAX_TIME_BIN] [--time-bin-unit {ms,us,ns}] [--lookup-table-file-path LOOKUP_TABLE_FILE_PATH]
                     [--tof-simulation-num-neutrons TOF_SIMULATION_NUM_NEUTRONS] [--tof-simulation-min-wavelength TOF_SIMULATION_MIN_WAVELENGTH] [--tof-simulation-max-wavelength TOF_SIMULATION_MAX_WAVELENGTH]
                     [--tof-simulation-min-ltotal TOF_SIMULATION_MIN_LTOTAL] [--tof-simulation-max-ltotal TOF_SIMULATION_MAX_LTOTAL] [--tof-simulation-seed TOF_SIMULATION_SEED] [--time-bin-width TIME_BIN_WIDTH | --nbins NBINS] [--verbose] [--skip-file-output]
                     [--output-file OUTPUT_FILE] [--overwrite] [--compression {NONE,GZIP,BITSHUFFLE_LZ4}]
essnmx-reduce: error: argument --time-bin-width: not allowed with argument --nbins

@nvaytet nvaytet added essnmx Issues for essnmx. labels Mar 25, 2026
@nvaytet nvaytet changed the title Use time bin width instead of number of bin edges. [Essnmx] Use time bin width instead of number of bin edges. Mar 25, 2026
@YooSunYoung
Copy link
Copy Markdown
Member Author

I just realized... we probably want to set the bid width in angstrom

@YooSunYoung YooSunYoung marked this pull request as draft March 27, 2026 08:31
@aaronfinke
Copy link
Copy Markdown
Contributor

I just realized... we probably want to set the bid width in angstrom

Not necessarily. Setting by TOF is more intuitive. I would keep it as seconds or milliseconds.

@YooSunYoung YooSunYoung marked this pull request as ready for review March 27, 2026 10:15
@YooSunYoung YooSunYoung moved this to In progress in Development Board Mar 27, 2026
@YooSunYoung YooSunYoung requested a review from nvaytet April 8, 2026 07:15
dim=t_coord_name,
start=min_t.to(unit=wf_config.time_bin_unit),
stop=max_t.to(unit=wf_config.time_bin_unit),
step=time_bin_width,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be careful with using arange here. Depending on the step size, you can be dropping the max edge.
For example, np.arange(1, 17.5, 3) yields array([ 1., 4., 7., 10., 13., 16.]). So the edges would stop at 16, missing everything after that.
Similarly, it could be surprising that np.arange(2, 18., 2) stops at 16, while np.arange(2, 18.01, 2) stops at 18...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah... we might not want to lose the last bin... should I just calculate the number of bins and correct last edge, and use linspace instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we / @aaronfinke need to decide which is the quantity that should be preserved.
If we specify a range with min and max, and a bin width, but the range does not span an integer number of bin widths, should we:

  1. preserve the bin width and adjust the start or end of the range with some padding to allow an integer number of bin widths
  2. preserve the range and compute a number of bins based on the requested width, use linspace, and end up with a bin width which will not exactly match the requested width
  3. Something else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My immediate reaction is that setting the bin length to the user defined value is more important than ensuring all bin sizes are equal, so I'd go for option 1. So setting the nbins as $floor(\Delta_{TOF}/t_{bin})$ and setting bin[0] and bin[-1] size as $t_{bin}$ + $(\Delta_{TOF}$ $modulo$ $t_{bin})/2$ makes the most sense.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just talked offline and the conclusion was to keep the bin-width as the user-input and add an extra padding bin if needed.

I'll fix this in this way and ask for a review again.

Image

# i.e. if the look up table wavelength rage changes,
# the number of bins changes and the histogram data sizes changes.
# This test is only for checking if the look up table is used as expected or not
# therefore using number of bins should be fine.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood why changes in the resolution/range in the lookup table would cause changes in number of bins in the reduced result? What did I miss?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the wavelength range of reduced data changes if the lookup table wavelength range changes.
If the wavelength range shrinks, the number of bins should be smaller.

default_model = default_child.model_dump(mode='python')
for key, testing_value in testing_model.items():
if key == 'lookup_table_file_path':
if key in ['lookup_table_file_path', 'nbins']:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also include time_bin_width? It can also be None, right?

histogram = tof_da.hist({t_coord_name: t_bin_edges.to(unit=t_coord_unit)})
tof_histograms[detector_name] = histogram

_tof_histogram = next(iter(tof_histograms.values()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit annoying that we have to do this, instead of just using the t_bin_edges below.
I understand it's because of

histogram = tof_da.hist({t_coord_name: t_bin_edges.to(unit=t_coord_unit)})

above.

Does that mean that the t_coord_unit can be different for different detector panels? How likely is that?

Is the unit of the tof coordinate in the reduced data something the user can control via one of the args?
If so, maybe the provider that computes the tof should do the conversion to that unit, and then we would know the unit from the argument and could avoid some gymnastics here?

Being able to say whether you want the output in microseconds or milliseconds sounds like something the users might want?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that the t_coord_unit can be different for different detector panels? How likely is that?

No... it was just me being lazy to do it only once...

But in theory they could have different units I guess. Although we probably want same units in the output.

Being able to say whether you want the output in microseconds or milliseconds sounds like something the users might want?

Probably...? Then it's easy for us to decide the unit here.

YooSunYoung and others added 5 commits April 9, 2026 09:51
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com>
Co-authored-by: Sunyoung Yoo <17974113+YooSunYoung@users.noreply.github.com>
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com>
Co-authored-by: Sunyoung Yoo <17974113+YooSunYoung@users.noreply.github.com>
Comment on lines +184 to +187
if bin_edges[t_coord_name, -1] < max_t:
# Need to append one more edge to cover the whole range.
true_last_bin_edge = bin_edges[t_coord_name, -1] + time_bin_width
bin_edges = sc.concat([bin_edges, true_last_bin_edge], dim=t_coord_name)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was easier to read/write than calculating linspace nbins start stop...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

essnmx Issues for essnmx.

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants