Skip to content

Proposal: An Ordered list w/ unique keys can be used in place of a unordered dictionary#203

Closed
DavidSagan wants to merge 4 commits intomainfrom
syntax2
Closed

Proposal: An Ordered list w/ unique keys can be used in place of a unordered dictionary#203
DavidSagan wants to merge 4 commits intomainfrom
syntax2

Conversation

@DavidSagan
Copy link
Copy Markdown
Member

@DavidSagan DavidSagan commented Mar 9, 2026

Clarified that ordered list with unique keys can be used in place of a unordered dictionary.

Rationale: A PALS parser may not always be able to preserve insertion order when it reads in a dict. And it is sometimes desirable to preserve the order. In particular, the C++ parser being developed does not have this capability since it uses an external parsing library that does not have this feature. (*)

Correction added by @ax3l: that premise (*) is wrong. The currently picked library (and many other C++ libs for YAML, JSON, etc.) do in fact support preservation of insertion order. This is a bug in pals-cpp (and here is the fix); it is not a library issue and not a fundamental limitation.

@DavidSagan DavidSagan requested review from EZoni, ax3l, cemitch99 and jlvay March 9, 2026 21:12
cemitch99
cemitch99 previously approved these changes Mar 30, 2026
Comment thread source/conventions.md Outdated
Comment thread source/conventions.md
Comment thread source/conventions.md Outdated
EZoni
EZoni previously approved these changes Mar 30, 2026
@ax3l ax3l self-assigned this Mar 31, 2026
Comment thread source/conventions.md
with the restriction that no duplicate keys can be present in the list.
For example, the above dictionary written as a list would look like:
```{code} yaml
this_dictionary_expressed_as_list:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referrring to named_dictionary above?
Aka saying

  - named_dictionary:
      key3: value3
      key4: value4

is the same as

  - named_dictionary:
      - key3: value3
      - key4: value4

Sorry, but I fear this is not a good idea, it adds ambiguity in parsing and complicates everything for no obvious gain.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't care. You and @EZoni can come to some agreement on what is best.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key differences:

  Example 1 Example 2
Structure of named_dictionary A single mapping/dict A list of mappings/dicts
Access value3 (Python) data[0]["named_dictionary"]["key3"] data[0]["named_dictionary"][0]["key3"]
Can have duplicate keys? No (keys must be unique in a dict) Yes (each dict is separate)
Order guaranteed? Depends on implementation Yes (it's a list)

The choice between them depends on your use case. If the keys are unique and represent a fixed structure, a single dictionary (Example 1) is simpler. If you need an ordered collection or might have duplicate keys, a list of dictionaries (Example 2) is more appropriate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't care. You and @EZoni can come to some agreement on what is best.

Sounds good.
I would simply not relax interpretation and keep it clear. I.e., not adding the case in this PR (not merging it).

Copy link
Copy Markdown
Member

@ax3l ax3l Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major problem

This is not a great idea as it complicates parsing and introduces ambiguities.
We will regret it later in the parsers and have to dynamically branch in the schema application all the time.

Minor Problem

It is also not standard how YAML is interpreted (i.e. YAML 101 is the section this adds to.)
But I oppose this for the more fundamental reason above, not for where it is written.

Copy link
Copy Markdown
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary for status on the PR: I oppose this, this is not a good idea.

Details described here: #203 (comment)

@DavidSagan
Copy link
Copy Markdown
Member Author

I oppose this, this is not a good idea.

@ax3l You need to articulate your objection. In fact you advocated the parser being able to preserve dictionary order if possible and this just does the same thing. And this works even in the case where the parser is not able to preserve order.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Mar 31, 2026

This is just the summary/status, let me link the comment for clarity.

@DavidSagan
Copy link
Copy Markdown
Member Author

This is just the summary/status, let me link the comment for clarity.

I don't see this as an objection. What exactly are you worried about?

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Mar 31, 2026

Both things are not equivalent in YAML; if we choose to relax them to be equivalent for PALS this will cause a problems writing and parsing schemas.

Lastly, in my view there is not a strong motivation why this complication is needed at all, lead by an example where this occurs at all please:

A PALS parser may not always be able to preserve insertion order when it reads in a dict. [...] . In particular, the C++ parser begin developed does not have this capability since it uses an external parsing library that does not have this feature.

C++ has multiple ways to express insertions preserving dicts and also the YAML library you picked actually supports it. There is a bug in your implementation.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Mar 31, 2026

Here you go:

Detailed bug report: pals-project/pals-cpp#18
Bug fix: pals-project/pals-cpp#19

@DavidSagan
Copy link
Copy Markdown
Member Author

Both things are not equivalent in YAML; if we choose to relax them to be equivalent for PALS this will cause a problems writing and parsing schemas.

Lastly, in my view there is not a strong motivation why this complication is needed at all, lead by an example where this occurs at all please:

A PALS parser may not always be able to preserve insertion order when it reads in a dict.

I don't see how this causes problems for writing and parsing. If you think there is a problem please come up with an example. In terms of why this is a desirable feature, I quote what your wrote:

   Code Developer note:
   PALS dictionaries should, when possible, implement a dictionary that preserves insertion order.

   While not strictly necessary, this helps with human readability:
   For example, having the [`kind`](#c:element.parameters) key of an element as the first attribute enhances legibility.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Mar 31, 2026

Fundamental

No, please fix pals-project/pals-cpp#18
It is possible in C++ to implement a dictionary that preserves insertion order.

As the reference implementation in C++, pals-cpp must not cut corners as fundamental as this.

Banter

In fact you advocated the parser being able to preserve dictionary order if possible and this just does the same thing. And this works even in the case where the parser is not able to preserve order.

My whole motivation behind the when possible was to simply not order the read-back dict if we encounter a strong use case that we forgot, not to complicate parsing for everyone because you picked a bad dependency, as you suggest here.

@DavidSagan
Copy link
Copy Markdown
Member Author

No, please fix pals-project/pals-cpp#18 It is possible in C++ to implement a dictionary that preserves insertion order.

OK will fix but this is independent of this PR.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Mar 31, 2026

Then PR has no motivating real-world use case left, and we can close it.

I propose to drop when possible from PALS if that makes you feel better. I was just open to not sort things if it makes it easier for people, but if this is twisted into motivating a complication of the schema as this PR tries, then I rather drop it completely.

@DavidSagan
Copy link
Copy Markdown
Member Author

Then PR has no motivating real-world use case left, and we can close it.

I propose to drop when possible from PALS if that makes you feel better. I was just open to not sort things if it makes it easier for people, but if this is twisted into motivating a complication of the schema as this PR tries, then I rather drop it completely.

I would agree with you if this represented a significant complication. But it does not. To have a parser handle this is literally a few lines of code. And once implemented there is no further maintenance needed.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

That is not correct, this text would add a conditional branch check for every dict to be a list, too. That is extremely random and confusing and will cause a lot of headache if enacted. Every dict has to be checks to be a list. Files will have all kinds of formatting. Typing will be a mess. This will cause confusion. You cannot write clear static scheme.

Or just run it through an LLM of your choice and ask it if it is a great idea for a portable standard that shall be easy to implement and verify.

Again, there is no real world need for this. Why do we even discuss this?

@DavidSagan
Copy link
Copy Markdown
Member Author

That is not correct, this text would add a conditional branch check for every dict to be a list, too. That is extremely random and confusing and will cause a lot of headache if enacted. Every dict has to be checks to be a list. Files will have all kinds of formatting. Typing will be a mess. This will cause confusion. You cannot write clear static scheme.

Or just run it through an LLM of your choice and ask it if it is a great idea for a portable standard that shall be easy to implement and verify.

Again, there is no real world need for this. Why do we even discuss this?

When programmed correctly, the conditional branch is handled by the same low level routine that is written once and is transparent to the higher level code. Again, this can be done with a few lines of code.

And I can see a real world case for this for some extensions where order matters in a construct that PALS specifies as unordered.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

this can be done with a few lines of code.

No. And you ignore what I wrote on schemas and the other points I made why this is a bad idea.

I can see a real world case for this for some extensions where order matters in a construct that PALS specifies as unordered.

No sorry, this is a super hypothetical need. I do strongly recommend against this. My reasons are above.

@DavidSagan
Copy link
Copy Markdown
Member Author

I can see a real world case for this for some extensions where order matters in a construct that PALS specifies as unordered.

No sorry, this is a super hypothetical need. I do strongly recommend against this. My reasons are above.

You have not justified your assertions. I have told you how a parser can handle this very simply. If you want more detail I can supply it. And there are other ways of simply handling this by converting from list to ordered dict on input.

If you think that this complicates things outside of any parsing please be more specific. And example would help.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

Let's start with the motivation for this PR: why would the PALS standard specify that "something must be unordered"? Please justify and motivate the actual need first.

in a construct that PALS specifies as unordered.

Where is this needed?
What would break or be terrible if the unnamed, future PALS-"unordered thing" was unnecessarily ordered in practice by a reader/writer?

There is no example here that one can follow.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

Here is a Claude summary of your change:

The proposal trades one problem (potential ordering ambiguity in mappings) for a worse problem (representational ambiguity across the entire schema). A standard schema should have exactly one canonical way to represent a given piece of data. If ordering matters, it should be encoded explicitly rather than relying on structural conventions that every consumer must independently understand and implement.

Absolutely, I will give you now 4 problems it causes + examples for your fully motivation-example-free proposal.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

1. It introduces representational ambiguity.

The same logical data now has two valid serializations. Every tool, validator, and parser that consumes these lattice files must handle both representations identically. This doubles the surface area for bugs and increases the cognitive load on anyone reading or writing these files. A contributor looking at two lattice files might see what appears to be structurally different data that is actually semantically identical.

Example

Two files describe the same beamline (ignore details), but look structurally different:

File A (dictionary form):

  facility:
    drift1:
        kind: Drift
        length: 0.25
  
    quad1:
        kind: Quadrupole
        MagneticMultipoleP:
          Bn1: 1.0
        length: 1.0
  

File B (list form):

  facility:
    - drift1:
        kind: Drift
        length: 0.25
  
    - quad1:
        kind: Quadrupole
        MagneticMultipoleP:
          Bn1: 1.0
        length: 1.0

Are these the same? A human reading them might not be sure. A diff tool will flag them as completely different. A code review becomes harder because a contributor could switch forms arbitrarily, and git diff will show a large structural change that is semantically a no-op.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

2. It conflates two distinct data models.

A list of key-value pairs and a dictionary are fundamentally different structures. By declaring them interchangeable (under constraints), you're essentially asking every consumer to implement a normalization step — "if you see a list of single-key mappings with no duplicate keys, treat it as a dict." This is implicit schema logic that lives outside the schema itself.

Every consumer of the schema must now implement normalization logic. Consider a tool that looks up an element by name:

# With a dict, this is trivial:
quad_params = lattice["facility"]["quad1"]

# With the list-of-dicts form, you need:
quad_params = None
for entry in lattice["elements"]:
    if "quad1" in entry:
        quad_params = entry["quad1"]
        break

Now imagine a validation library, a conversion tool to MAD-X format, a visualization tool, and a simulation runner. Each of these independently must include branching logic:

def get_elements(lattice):
    elems = lattice["facility"]
    if isinstance(elems, list):
        # convert list-of-single-key-dicts to ordered dict
        result = {}
        for item in elems:
            result.update(item)
        return result
    elif isinstance(elems, dict):
        return elems
    else:
        raise SchemaError("Invalid elements format")

Every tool in the ecosystem reimplements this, slightly differently, and each is a potential source of bugs.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

3. The duplicate-key restriction is fragile.

The constraint "no duplicate keys in the list" must be enforced at validation time, not by the data format itself. YAML won't stop you from writing duplicate keys in a list. So now you need a custom validator for something that dictionaries give you for free (or at least by convention: YAML's handling of duplicate keys in mappings is itself underspecified, but most parsers will warn or take the last value).

Nothing in YAML prevents this:

  facility:
    - drift1:
        kind: Drift
        length: 0.25
  
    - quad1:
        kind: Quadrupole
        MagneticMultipoleP:
          Bn1: 1.0
        length: 1.0

    - drift1:
        kind: Drift
        length: 0.5

This is perfectly valid YAML. The list happily contains two entries keyed quad1. A dictionary would have silently overwritten the first or raised a warning, but the list form gives no such signal. (This is an example and happens at every level of the PALS hierarchy, even if we allow duplicate names in this specific list of the example.)

The schema now requires a custom validator to catch this: (your "this is just a few lines, don't worry, Axel, I got this")

def validate_no_duplicate_keys(elements_list):
    keys_seen = set()
    for item in elements_list:
        for key in item:
            if key in keys_seen:
                raise ValidationError(f"Duplicate key '{key}' in elements list")
            keys_seen.add(key)

If any one tool in the ecosystem forgets this check, it will silently process the file with unpredictable behavior: maybe using the first quad1, maybe the last, maybe both.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

4. It solves a problem that's largely been solved.

Python 3.7+ guarantees dict insertion order. Most modern YAML libraries preserve mapping order (you can pick a proper one for a PALS reference implementation). C++ supports insertion-order stable maps. If the concern is interoperability with languages or tools that don't preserve order, the cleaner solution is to specify that compliant parsers must use order-preserving mappings, rather than introducing an alternative representation.

Example: pals-project/pals-cpp#18

@DavidSagan
Copy link
Copy Markdown
Member Author

First of all, I do not accept Claude as an authority on any of this. Especially since answers may be manipulated depending upon the questions. There is a much better way to gauge this. Just ask people if they think they would be confused.

@DavidSagan
Copy link
Copy Markdown
Member Author

The schema now requires a custom validator to catch this: (your "this is just a few lines, don't worry, Axel, I got this")

def validate_no_duplicate_keys(elements_list):
    keys_seen = set()
    for item in elements_list:
        for key in item:
            if key in keys_seen:
                raise ValidationError(f"Duplicate key '{key}' in elements list")
            keys_seen.add(key)

If any one tool in the ecosystem forgets this check, it will silently process the file with unpredictable behavior — maybe using the first quad1, maybe the last, maybe both.

There are many checks a validator must do. This includes spelling checks, etc. What you show would be a small part of validation.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

Yes a validator can check many things, but this misses the point.

The difference is that every other validation check catches errors that are inherent to the problem domain: wrong element types, out-of-range parameters, misspelled names. Those errors exist regardless of how you design the schema.

The key/list/duplicate check is different: it only exists because the schema introduced a representation that makes it possible. You're not catching a physics mistake, you're catching an artifact of a design choice. A dictionary makes this class of error structurally impossible.

The best validation check is the one you don't need to write.

@DavidSagan
Copy link
Copy Markdown
Member Author

The best validation check is the one you don't need to write.

I agree. But I believe this is being blown all out of proportion.

@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

First of all, I do not accept Claude as an authority on any of this. Especially since answers may be manipulated depending upon the questions. There is a much better way to gauge this. Just ask people if they think they would be confused.

Don't worry, I suggested you early on to critically self-review it with an authority of your choice.

I am still waiting for a concrete use case example that motivates all this. It is hard to follow at all why this is worth spending time on now.

Your only concrete use case is built on the wrong premise that your current C++ YAML lib does not support it, but it in fact does preserve order mapping and you have a bug in pals-cpp while using it. Here is the fix, btw.

@ax3l ax3l closed this Apr 1, 2026
@ax3l ax3l reopened this Apr 1, 2026
@ax3l
Copy link
Copy Markdown
Member

ax3l commented Apr 1, 2026

Sorry, clicked the wrong button.

The best validation check is the one you don't need to write.

I agree. But I believe this is being blown all out of proportion.

I designed concrete cases above that prevent better support by existing structural serialization/deserialization and falls back for us "writing all the validation (structural and physics/PALS meaning) from scratch". This is not how we do this; for modern standards we want to rely on structural schemas, diff tools, declarative validation schemes, etc.. One builds on the other, we do not mix parsing YAML, doing schema validation, and doing physics mapping into the same levels.

I stand behind all problems 1-4 and want to see them either solved or strongly motivated why it is worth to add this.

I am repeating myself: for such structural changes, please lead with concrete examples/needs that make this necessary, not a vague "in the future / maybe [no concrete case]".

@ax3l ax3l changed the title Clarified that ordered list with unique keys can be used in place of a unordered dictionary. Proposal: An Ordered list w/ unique keys can be used in place of a unordered dictionary Apr 1, 2026
@ax3l ax3l added the schema: structural changes Structural changes to the PALS schema label Apr 1, 2026
@EZoni EZoni dismissed their stale review April 1, 2026 22:16

Overlooked impact of proposed change.

@EZoni EZoni self-requested a review April 1, 2026 22:16
@ax3l ax3l added the invalid This doesn't seem right label Apr 2, 2026
Copy link
Copy Markdown
Member

@jlvay jlvay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After finally understanding what this PR was doing and reviewing, I think that the proposed option would bring confusion and additional work for the parsers that is not needed.

@jlvay jlvay closed this Apr 6, 2026
@jlvay
Copy link
Copy Markdown
Member

jlvay commented Apr 6, 2026

Decided to close after extensive discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right schema: structural changes Structural changes to the PALS schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants