[Essnmx] Use time bin width instead of number of bin edges.#264
[Essnmx] Use time bin width instead of number of bin edges.#264YooSunYoung wants to merge 8 commits intomainfrom
Conversation
|
The essnmx-reduce output should include the number of generated bins and also (this would be nice) number of events per bin. |
|
The failing tests seem to be related to the resource issue....?? It doesn't fail on my workstation... I'll have a look at it tmr... |
Okay...!
You mean the logging/printing them out? |
Certainly as logger output, yes, but also a cute 1D histogram would also be useful. |
|
DIALS code for making that histogram: https://github.com/dials/dials/blob/main/src/dials/util/ascii_art.py |
|
@aaronfinke pixi run essnmx-reduce --nbins 3 --time-bin-width 3
usage: essnmx-reduce [-h] --input-file INPUT_FILE [INPUT_FILE ...] [--swmr] [--detector-ids DETECTOR_IDS [DETECTOR_IDS ...]] [--iter-chunk] [--chunk-size-pulse CHUNK_SIZE_PULSE] [--chunk-size-events CHUNK_SIZE_EVENTS]
[--time-bin-coordinate {event_time_offset,time_of_flight}] [--min-time-bin MIN_TIME_BIN] [--max-time-bin MAX_TIME_BIN] [--time-bin-unit {ms,us,ns}] [--lookup-table-file-path LOOKUP_TABLE_FILE_PATH]
[--tof-simulation-num-neutrons TOF_SIMULATION_NUM_NEUTRONS] [--tof-simulation-min-wavelength TOF_SIMULATION_MIN_WAVELENGTH] [--tof-simulation-max-wavelength TOF_SIMULATION_MAX_WAVELENGTH]
[--tof-simulation-min-ltotal TOF_SIMULATION_MIN_LTOTAL] [--tof-simulation-max-ltotal TOF_SIMULATION_MAX_LTOTAL] [--tof-simulation-seed TOF_SIMULATION_SEED] [--time-bin-width TIME_BIN_WIDTH | --nbins NBINS] [--verbose] [--skip-file-output]
[--output-file OUTPUT_FILE] [--overwrite] [--compression {NONE,GZIP,BITSHUFFLE_LZ4}]
essnmx-reduce: error: argument --time-bin-width: not allowed with argument --nbins |
|
I just realized... we probably want to set the bid width in angstrom |
a74a17f to
c4a32dd
Compare
Not necessarily. Setting by TOF is more intuitive. I would keep it as seconds or milliseconds. |
c4a32dd to
dbff8c3
Compare
| dim=t_coord_name, | ||
| start=min_t.to(unit=wf_config.time_bin_unit), | ||
| stop=max_t.to(unit=wf_config.time_bin_unit), | ||
| step=time_bin_width, |
There was a problem hiding this comment.
I think we need to be careful with using arange here. Depending on the step size, you can be dropping the max edge.
For example, np.arange(1, 17.5, 3) yields array([ 1., 4., 7., 10., 13., 16.]). So the edges would stop at 16, missing everything after that.
Similarly, it could be surprising that np.arange(2, 18., 2) stops at 16, while np.arange(2, 18.01, 2) stops at 18...
There was a problem hiding this comment.
Hmm yeah... we might not want to lose the last bin... should I just calculate the number of bins and correct last edge, and use linspace instead?
There was a problem hiding this comment.
Well we / @aaronfinke need to decide which is the quantity that should be preserved.
If we specify a range with min and max, and a bin width, but the range does not span an integer number of bin widths, should we:
- preserve the bin width and adjust the start or end of the range with some padding to allow an integer number of bin widths
- preserve the range and compute a number of bins based on the requested width, use linspace, and end up with a bin width which will not exactly match the requested width
- Something else?
There was a problem hiding this comment.
My immediate reaction is that setting the bin length to the user defined value is more important than ensuring all bin sizes are equal, so I'd go for option 1. So setting the nbins as
| # i.e. if the look up table wavelength rage changes, | ||
| # the number of bins changes and the histogram data sizes changes. | ||
| # This test is only for checking if the look up table is used as expected or not | ||
| # therefore using number of bins should be fine. |
There was a problem hiding this comment.
Not sure I understood why changes in the resolution/range in the lookup table would cause changes in number of bins in the reduced result? What did I miss?
There was a problem hiding this comment.
Because the wavelength range of reduced data changes if the lookup table wavelength range changes.
If the wavelength range shrinks, the number of bins should be smaller.
| default_model = default_child.model_dump(mode='python') | ||
| for key, testing_value in testing_model.items(): | ||
| if key == 'lookup_table_file_path': | ||
| if key in ['lookup_table_file_path', 'nbins']: |
There was a problem hiding this comment.
Shouldn't this also include time_bin_width? It can also be None, right?
| histogram = tof_da.hist({t_coord_name: t_bin_edges.to(unit=t_coord_unit)}) | ||
| tof_histograms[detector_name] = histogram | ||
|
|
||
| _tof_histogram = next(iter(tof_histograms.values())) |
There was a problem hiding this comment.
A bit annoying that we have to do this, instead of just using the t_bin_edges below.
I understand it's because of
histogram = tof_da.hist({t_coord_name: t_bin_edges.to(unit=t_coord_unit)})above.
Does that mean that the t_coord_unit can be different for different detector panels? How likely is that?
Is the unit of the tof coordinate in the reduced data something the user can control via one of the args?
If so, maybe the provider that computes the tof should do the conversion to that unit, and then we would know the unit from the argument and could avoid some gymnastics here?
Being able to say whether you want the output in microseconds or milliseconds sounds like something the users might want?
There was a problem hiding this comment.
Does that mean that the t_coord_unit can be different for different detector panels? How likely is that?
No... it was just me being lazy to do it only once...
But in theory they could have different units I guess. Although we probably want same units in the output.
Being able to say whether you want the output in microseconds or milliseconds sounds like something the users might want?
Probably...? Then it's easy for us to decide the unit here.
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com> Co-authored-by: Sunyoung Yoo <17974113+YooSunYoung@users.noreply.github.com>
Co-authored-by: Neil Vaytet <39047984+nvaytet@users.noreply.github.com> Co-authored-by: Sunyoung Yoo <17974113+YooSunYoung@users.noreply.github.com>
| if bin_edges[t_coord_name, -1] < max_t: | ||
| # Need to append one more edge to cover the whole range. | ||
| true_last_bin_edge = bin_edges[t_coord_name, -1] + time_bin_width | ||
| bin_edges = sc.concat([bin_edges, true_last_bin_edge], dim=t_coord_name) |
There was a problem hiding this comment.
This was easier to read/write than calculating linspace nbins start stop...

Requested by @aaronfinke .
Now you can set
--time-bin-widthinstead ofnbins.I didn't remove the
nbinsoption so you can still use it, but you need to set the--time-bin-width 0.Command Example
In this case below, the
nbinswill be ignored and defaulttime-bin-widthof3[ms]will be used.Command Helper
essnmx-reduce -h Workflow Configuration: --time-bin-coordinate {event_time_offset,time_of_flight} Coordinate to bin the time data. Selecting `event_time_offset` means reduction steps are skipped, i.e. calculating `time of flight(tof)` and simply saves histograms of the raw data. (default: time_of_flight) --time-bin-width TIME_BIN_WIDTH Width(Length) of each Time Bin in [time_bin_unit]. If `time_bin_width` and `nbins` are both given, `time_bin_width` will be preferred. Set it to `0` if you want to use `nbins` instead. (default: 3) --nbins NBINS Number of Time bins. If `bin_width` is given, `nbins` will be ignored. (default: 50)