Skip to content

Conversation

@prasad-sawantdesai
Copy link
Collaborator

Added support of slicing for IDSStructArray

import imas

entry = imas.DBEntry("imas:mdsplus?user=public;pulse=116000;run=4;database=ITER_MD;version=3", "r")  
walids = entry.get("wall", autoconvert=False) # lazy=True
units = walids.description_2d[0].vessel.unit
print(units[8:])
print(units[0:100])
print(units[4:])
print(units[5])

if stop is None:
stop = sys.maxsize

for i in range(start or 0, stop, step or 1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prasad, Python slices are weird and have lots of edge cases with negative indices... the current logic doesn't quite capture that.

For example:

>>> import imas
>>> entry = imas.DBEntry("imas:hdf5?path=./test", "w")
09:44:08 INFO     Parsing data dictionary version 4.0.0 @dd_zip.py:166
>>> cp = entry.factory.core_profiles()
>>> cp.profiles_1d.resize(10)
>>> cp.ids_properties.homogeneous_time = 1
>>> cp.time = [*range(10)]
09:45:05 INFO     Assigning incorrect type 'int64' to <IDSNumericArray (IDS:core_profiles, time, empty FLT_1D)>, attempting automatic conversion. @ids_primitive.py:483
>>> entry.put(cp)
>>> cp.profiles_1d[:-5]
[<IDSStructure (IDS:core_profiles, profiles_1d[0])>, <IDSStructure (IDS:core_profiles, profiles_1d[1])>, <IDSStructure (IDS:core_profiles, profiles_1d[2])>, <IDSStructure (IDS:core_profiles, profiles_1d[3])>, <IDSStructure (IDS:core_profiles, profiles_1d[4])>]
>>> entry.get("core_profiles", lazy=True).profiles_1d[:-5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/maarten/projects/iter-python/imas-python/imas/ids_struct_array.py", line 137, in __getitem__
    return self.value[item]
           ~~~~~~~~~~^^^^^^
TypeError: 'NoneType' object is not subscriptable

Edge cases with negative indices:

  • Negative start, e.g. [-5:5], all items from the fifth from the end to the fifth from the beginning
  • Negative stop, e.g. [2:-2], all items from the third to the second-to-last
  • Negative increment, e.g. [::-1] all items in reverse order

What is the the use case for this feature? With a list comprehension users could capture (most) scenarios as well:

p1ds = [cp.profiles_1d[i] for i in range(2, 5)]
# in reverse
for p1d in reversed(cp.profiles_1d):
    ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing @maarten-ic

Actually, this issue arises when using slicing for wall IDs, as shown below. There is a workaround—you can convert it into a list—but then you lose the IDSStructArray metadata. It would be convenient for users to work it as a list, what do you think @olivhoenen but of course we need to consider edge cases.
units = list(walids.description_2d[0].vessel.unit)

import imaspy as imas

entry = imas.DBEntry("imas:mdsplus?user=public;pulse=116000;run=4;database=ITER_MD;version=3", "r")  
walids = entry.get("wall", autoconvert=False)
units = walids.description_2d[0].vessel.unit
# units = list(walids.description_2d[0].vessel.unit) workaround
print(units[8:])

# 09:01:55 INFO     Parsing data dictionary version 4.0.0 @dd_zip.py:166
# 09:01:59 INFO     Parsing data dictionary version 3.40.0 @dd_zip.py:166
# Traceback (most recent call last):
#  File "/home/ITER/sawantp1/git/idstoolsimaspy/idstools/testbug.py", line 6, in <module>
#    print(units[8:])
#         ~~~~~^^^^
#  File "/work/imas/opt/EasyBuild/software/IMASPy/1.2.0-intel-2023b/lib/python3.11/site-
# packages/imaspy/ids_struct_array.py", line 126, in __getitem__
#    list_idx = int(item)
               ^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'slice'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prasad,

you can convert it into a list—but then you lose the IDSStructArray metadata

I don't understand the drawback: with the slicing you have implemented, unit[8:] is also a list, right?

when using slicing for wall IDs, as shown below

Yes, sure, but what's the actual use case for doing this? 🙂 I don't understand why a user would want to print all units except the first 8 (as you do in your code listing).

Note that I found out that slice objects have a method that give you the start/stop/step values for a sequence of given length: https://docs.python.org/3/reference/datamodel.html#slice.indices

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came up in the context of @prasad-sawantdesai work on IDStools, where different scripts are receiving IMAS URIs that may contain fragments (#ids:occ/ids_path) where the ids_path has slicing (as described in https://imas-data-dictionary.readthedocs.io/en/latest/_downloads/9e7ffd162c0237b61062528affd5fe2a/IDS-path-syntax.md).

What would be the easiest/recommended way to go from an ids_path string that contains slicing indices to an actual list of objects?

ps: in the case of the list of unit from the wall IDS, the machine description contains many different types of limiter and vessel units, not all of them make sense to overlay when plotting for example an equilibrium psi map

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Olivier,

where the ids_path has slicing

This PR doesn't really solve that case either. I could still not do wall.description_2d[0].vessel.unit[8:].element[:].name: the slice at unit[8:] will return a list that has no attribute element.

What would be the easiest/recommended way to go from an ids_path string that contains slicing indices to an actual list of objects?

This is actually a surprisingly tricky problem! And it mostly has to do with the possibility that IDSs don't have homogeneous array sizes deep down in the tree.

ibex has an implementation of this and, since it converts everything to JSON (which supports ragged arrays), it can avoid some of the difficulty. Looks like it is mostly implemented in _expand_single_path_element in this file. Though quickly scanning it, it looks like it doesn't handle the negative indices either (which is probably fine for the ibex use case).

I see two options for implementing your use case, with option 1 the easiest to implement and option 2 more generally useful 🙂

  1. Add a method to IDSPath, for example IDSPath.goto_slices that returns a list of elements that match the slice syntax. It would be an extended form of IDSPath.goto (which returns a single element and raises a NotImplementedError when a slice is used in the path),
  2. Implement slices on arrays of structures, which return a new object (e.g. IDSSlice). The IDSSlice can keep track of all objects that matched the slice expression, and allow further slicing of child elements, getting the matching elements, etc. That could look as follows:
>>> ids_slice = wall.description_2d[0].vessel.unit[8:]
>>> ids_slice
<IDSSlice (IDS:wall, description_2d[0].vessel.unit[8:], 4 matches)>
>>> ids_slice.element
<IDSSlice (IDS:wall, description_2d[0].vessel.unit[8:].element, 4 matches)>
>>> ids_slice.element[:]
<IDSSlice (IDS:wall, description_2d[0].vessel.unit[8:].element[:], 6 matches)>
>>> ids_slice.element[:].name
<IDSSlice (IDS:wall, description_2d[0].vessel.unit[8:].element[:].name, 6 matches)>
>>> list(ids_slice.element[:].name)
[<IDSString0D (IDS:wall, description_2d[0].vessel.unit[8].element[0].name)>,
 <IDSString0D (IDS:wall, description_2d[0].vessel.unit[8].element[1].name)>,
 <IDSString0D (IDS:wall, description_2d[0].vessel.unit[9].element[0].name)>,
 ...
 ]

In either case you'll need to decide how to handle edge cases. For example, what does it mean when you do unit[:2].element[2] when unit[0] has 4 elements and unit[1] has only 2 elements. Is this an IndexError? Or does this return a match of [unit[0].element[2]] and just ignore that there is no unit[1].element[2]?

There's probably more dragons hiding when implementing something like this -- there's a reason that it's not currently implemented in IDSPath 😉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why inhomogeneity is a problem there: if a user ask for a slice that does not exist we have an out-of-bound error, whether it is homogeneous or not (admittedly the risk increase in inhomogeneous case, but that should not be a concern for the implementation). IMO index-errors shall not be swallowed if they happen in nested elements, the error shall then just be propagated.

What do you think @prasad-sawantdesai, would something like 1 or 2 be useful in this case? 2 looks more appealing to me (but maybe that's because of the code snippet 🙄 ).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to implement second option :)

Copy link
Collaborator

@maarten-ic maarten-ic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prasad, I had a quick look at the current state of the PR.

Please have a look at below comments, I think it can simplify this logic a lot 🙂

@prasad-sawantdesai prasad-sawantdesai marked this pull request as ready for review November 21, 2025 23:02
@SimonPinches
Copy link
Collaborator

Do we want to squash all the commits with the same commit messages? (This PR now has > 3x the number of commits than files changed. Since this would involve a forced push this might not be very elegant for those who have cloned all branches so should we for the future move to only raising PRs from forks rather than pushing developments directly to this repo?)

@maarten-ic
Copy link
Collaborator

Do we want to squash all the commits with the same commit messages? (This PR now has > 3x the number of commits than files changed. Since this would involve a forced push this might not be very elegant for those who have cloned all branches so should we for the future move to only raising PRs from forks rather than pushing developments directly to this repo?)

Fine with me. This is done automatically when @olivhoenen presses the right buttons when merging, no force-push to this branch needed🙂
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#squash-and-merge-your-commits

@maarten-ic
Copy link
Collaborator

Thanks @prasad-sawantdesai for addressing my previous feedback! I've had a quick go at using this and found the following things. Note that I have not (yet) reviewed the code another time, but I'll try to do that later this week. In the meanwhile, could you have a look at the following points?

  1. Representation issue and inconsistencies with IDSSlice: name of first sliced AoS is missing; it would be nice to show the IDS name as well to make it consistent with other IDS object representations:
>>> eq.time_slice
<IDSStructArray (IDS:equilibrium, time_slice with 3 items)>
>>> eq.time_slice[:]
<IDSSlice ([:], 3 matches)>
  1. Empty slices do not check if a getattr is valid:
>>> eq.time_slice[3:]
<IDSSlice ([3:], 0 matches)>
>>> eq.time_slice[3:].x  # This should still raise an exception, `x` is not a valid child node of time_slice.
<IDSSlice ([3:].x, 0 matches)>
  1. The current getitem behaviour is confusing to me.
    I would remove the behaviour for cp.profiles_1d[:].grid[0] and just raise an exception (it is also not possible to index cp.profiles_1d[0].grid[0]!). If a user wants to use the IDSSlice as if it is a list, they can just do list(cp.profiles_1d[:].grid)[0] or cp.profiles_1d[:].grid.values()[0], however I don't think this is a valid use case and people would just do cp.profiles_1d[0].grid instead.
>>> cp.profiles_1d[:].grid
<IDSSlice ([:].grid, 3 matches)>
>>> cp.profiles_1d[:].grid[0]  # Indexing an IDSSlice returns an IDSStructure
<IDSStructure (IDS:core_profiles, profiles_1d[0]/grid)>
>>> cp.profiles_1d[:].ion
<IDSSlice ([:].ion, 3 matches)>
>>> cp.profiles_1d[:].ion[0]  # Indexing an IDSSlice returns another IDSSlice (!)
<IDSSlice ([:].ion[0], 3 matches)>

@prasad-sawantdesai prasad-sawantdesai removed the request for review from Simon-McIntosh November 29, 2025 22:17
@prasad-sawantdesai
Copy link
Collaborator Author

prasad-sawantdesai commented Nov 29, 2025

Thanks @prasad-sawantdesai for addressing my previous feedback! I've had a quick go at using this and found the following things. Note that I have not (yet) reviewed the code another time, but I'll try to do that later this week. In the meanwhile, could you have a look at the following points?

  1. Representation issue and inconsistencies with IDSSlice: name of first sliced AoS is missing; it would be nice to show the IDS name as well to make it consistent with other IDS object representations:
>>> eq.time_slice
<IDSStructArray (IDS:equilibrium, time_slice with 3 items)>
>>> eq.time_slice[:]
<IDSSlice ([:], 3 matches)>
  1. Empty slices do not check if a getattr is valid:
>>> eq.time_slice[3:]
<IDSSlice ([3:], 0 matches)>
>>> eq.time_slice[3:].x  # This should still raise an exception, `x` is not a valid child node of time_slice.
<IDSSlice ([3:].x, 0 matches)>
  1. The current getitem behaviour is confusing to me.
    I would remove the behaviour for cp.profiles_1d[:].grid[0] and just raise an exception (it is also not possible to index cp.profiles_1d[0].grid[0]!). If a user wants to use the IDSSlice as if it is a list, they can just do list(cp.profiles_1d[:].grid)[0] or cp.profiles_1d[:].grid.values()[0], however I don't think this is a valid use case and people would just do cp.profiles_1d[0].grid instead.
>>> cp.profiles_1d[:].grid
<IDSSlice ([:].grid, 3 matches)>
>>> cp.profiles_1d[:].grid[0]  # Indexing an IDSSlice returns an IDSStructure
<IDSStructure (IDS:core_profiles, profiles_1d[0]/grid)>
>>> cp.profiles_1d[:].ion
<IDSSlice ([:].ion, 3 matches)>
>>> cp.profiles_1d[:].ion[0]  # Indexing an IDSSlice returns another IDSSlice (!)
<IDSSlice ([:].ion[0], 3 matches)>

Thank you for the review Maarten

  1. Fixed the representation
>> eq.time_slice
<IDSStructArray (IDS:equilibrium, time_slice with 106 items)>
>> eq.time_slice[:]
<IDSSlice (IDS:equilibrium, time_slice[:] with 106 items)>
  1. Raises an exception ( This should still raise an exception, x is not a valid child node of time_slice)
>> eq.time_slice[3:].x
Traceback (most recent call last):
  File "/home/ITER/sawantp1/github/IMAS-Python/representation.py", line 11, in <module>
    print(eq.time_slice[3:].x)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/sawantp1/github/IMAS-Python/imas/ids_slice.py", line 219, in __getattr__
    child_elements = [getattr(element, name) for element in self]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/sawantp1/github/IMAS-Python/imas/ids_slice.py", line 219, in <listcomp>
    child_elements = [getattr(element, name) for element in self]
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ITER/sawantp1/github/IMAS-Python/imas/ids_structure.py", line 59, in __getattr__
    raise AttributeError(
AttributeError: IDS structure 'time_slice[3]' has no attribute 'x'
  1. It retrieves the grid IDSStructure from all profiles_1d, and accessing the first element returns an IDSStructure
>>> cp.profiles_1d[1:3].grid
<IDSSlice (IDS:core_profiles, profiles_1d[1:3].grid with 2 items)>
>>> cp.profiles_1d[1:3].grid.shape
(2,)
>>> cp.profiles_1d[1:3].grid.values()
[<IDSStructure (IDS:core_profiles, profiles_1d[1]/grid)>, <IDSStructure (IDS:core_profiles, profiles_1d[2]/grid)>]
>>> cp.profiles_1d[1:3].grid.to_array()
array([<IDSStructure (IDS:core_profiles, profiles_1d[1]/grid)>,
       <IDSStructure (IDS:core_profiles, profiles_1d[2]/grid)>],
      dtype=object)
>>> cp.profiles_1d[1:3].grid[0]
<IDSStructure (IDS:core_profiles, profiles_1d[1]/grid)>

In the example below, cp.profiles_1d[1:3].ion[0] returns the first ion IDSStructure from all selected profiles_1d

>>> cp.profiles_1d[1:3].ion
<IDSSlice (IDS:core_profiles, profiles_1d[1:3].ion with 2 items)>
>>> cp.profiles_1d[1:3].ion.shape
(2, 3)
>>> cp.profiles_1d[1:3].ion.values()
[<IDSStructArray (IDS:core_profiles, profiles_1d[1]/ion with 3 items)>, <IDSStructArray (IDS:core_profiles, profiles_1d[2]/ion with 3 items)>]
>>> cp.profiles_1d[1:3].ion.to_array()
array([[<IDSStructure (IDS:core_profiles, profiles_1d[1]/ion[0])>,
        <IDSStructure (IDS:core_profiles, profiles_1d[1]/ion[1])>,
        <IDSStructure (IDS:core_profiles, profiles_1d[1]/ion[2])>],
       [<IDSStructure (IDS:core_profiles, profiles_1d[2]/ion[0])>,
        <IDSStructure (IDS:core_profiles, profiles_1d[2]/ion[1])>,
        <IDSStructure (IDS:core_profiles, profiles_1d[2]/ion[2])>]],
      dtype=object)
>>> cp.profiles_1d[1:3].ion[0]
<IDSSlice (IDS:core_profiles, profiles_1d[1:3].ion[0] with 2 items)>
>>> cp.profiles_1d[1:3].ion[0].to_array()
array([[<IDSStructure (IDS:core_profiles, profiles_1d[1]/ion[0])>],
       [<IDSStructure (IDS:core_profiles, profiles_1d[2]/ion[0])>]],
      dtype=object)
>>> 

Copy link
Collaborator

@maarten-ic maarten-ic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Prasad,

  1. It retrieves the grid IDSStructure from all profiles_1d, and accessing the first element returns an IDSStructure
    [...]
    In the example below, cp.profiles_1d[1:3].ion[0] returns the first ion IDSStructure from all selected profiles_1d

Yes, I understand what it's doing, but I find it illogical and unexpected 😉
I would prefer to follow the principle of least astonishment, which - to me - would be: any operation on the IDSSlice should translate to the operation on the underlying IDS objects.

In pseudo code:

def do_something_with(ids_slice):
    return IDSSlice([do_something_with(ids_object) for ids_object in ids_slice])

We're doing this in most cases now (e.g. __getattr__, __getitem__ when the underlying objects are IDSStructArray) but not when the underlying objects are IDSStructure.

Let's set up a meeting to discuss in more detail if you want 🙂

(I still haven't reviewed the code in more detail, sorry! I'll do that this week.)

Returns:
Tuple of dimensions.
"""
return self._virtual_shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shapes are meaningless when the data is ragged, IMO this should be removed or raise an error if the underlying data is ragged.

"""Iterate over all matched elements."""
return iter(self._matched_elements)

def __getitem__(self, item: Union[int, slice]) -> Union[Any, "IDSSlice"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __getitem__(self, item: Union[int, slice]) -> Union[Any, "IDSSlice"]:
def __getitem__(self, item: Union[int, slice]) -> "IDSSlice":

A union with Any is not informative 😉. For clarity of the API I think this should just always return an IDSSlice (see also #20 (review))

Note that this method is also overly complex (80 lines for getitem is not good...)...

Comment on lines +205 to +208
if not self._matched_elements:
raise IndexError(
f"Cannot access node '{name}' on empty slice with 0 elements"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is now not allowed:

eq = imas.IDSFactory().equilibrium()
eq.time_slice[:].profiles_1d

profiles_1d is a valid child of time_slice, there just are no matching elements?

Why don't you check if the name is a valid child node of the metadata, as suggested in #20 (comment)?

from imas.ids_struct_array import IDSStructArray

child_metadata = None
if self.metadata is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a valid case: an IDSSlice should always have valid metadata? If not, then that should be fixed instead 🙂

Comment on lines +241 to +243
new_virtual_shape = self._virtual_shape + (
child_sizes[0] if child_sizes else 0,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot just assume that the size of all items is the same as the first. This may be fine in some applications, but not for a library like IMAS-Python.

I also believe this PR is getting way too complicated... Can we first do the basic IDSSlice stuff, and then make a second PR for tensorization scenarios?

Comment on lines +296 to +308
from imas.util import get_toplevel, get_full_path

my_repr = f"<{type(self).__name__}"
ids_name = "unknown"
full_path = self._path

if self._parent_array is not None:
ids_name = get_toplevel(self._parent_array).metadata.name
parent_array_path = get_full_path(self._parent_array)
full_path = parent_array_path + self._path
item_word = "item" if len(self) == 1 else "items"
my_repr += f" (IDS:{ids_name}, {full_path} with {len(self)} {item_word})>"
return my_repr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more complex than it needs to be:

  • self.metadata should know which IDS its from (and we could add it to IDSMetadata if that's not already the case),
  • Why don't you make self._path the full path, instead of having to construct full_path = parent_array_path + self._path?

Comment on lines +331 to +334
- 1D slices: List of raw Python/numpy values or unwrapped elements
- Multi-D with reshape=False: List of elements (each being an array)
- Multi-D with reshape=True: numpy.ndarray with shape self.shape,
or nested lists/object array representing structure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this method tries to do too much at once.

Proposal:

  • Let's just return a list of elements with values().
  • If you want tensorization then use to_array().
  • If you want the multi-dimensionality, don't use IDSSlice at all, and just directly index the IDS itself?
    Example:
    # Directly index IDS is much cleaner (and more performant)
    var = eq.time_slice[i1].profiles_2d[i2]
    # then creating a slice, and attempting to use the ND-feature?
    slice = eq.time_slice[:].profiles_2d[:]
    nd = slice.values(True)
    ...
    var = nd[i1][i2]

except (ValueError, TypeError):
return flat_values

def to_array(self) -> np.ndarray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also simplify this method: tensorization is only relevant for when the slice refers to leaf nodes (e.g. FLT_1D).

Proposal:

  • to_array raises an Exception when the IDSSlice refers to IDSStructure or IDSStructArrays
  • to_array raises an Exception when the shape is irregular
  • to_array never returns object arrays (when the user calls this method, they want a tensorized array, not object arrays)

Comment on lines +133 to +155
if self._lazy:

self._load(None) # Load size

# Convert slice to indices
start, stop, step = item.indices(len(self))

# Load only the elements in the slice range
loaded_elements = []
for i in range(start, stop, step):
self._load(i) # Load each element on demand
loaded_elements.append(self.value[i])

from imas.ids_slice import IDSSlice

slice_str = IDSSlice._format_slice(item)

return IDSSlice(
self.metadata,
loaded_elements,
slice_str,
parent_array=self,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simplify this to:

if self._lazy:
    matched_elements = [self[i] for i in item.indices(len(self))]
else:
    matched_elements = self.value[item]

and let the existing logic handle the lazy loading for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants