Skip to content

Conversation

@soutaro
Copy link
Member

@soutaro soutaro commented Jan 23, 2026

The parser currently uses global constant pool and updates the pool during parsing through RBS_KEYWORD node. This is not thread-safe and we cannot run the parser in parallel. This PR is to remove the global constant pool and RBS_KEYWORD node by:

  1. Simplify the location structure from the nested one to flat struct attributes
  2. Introducing C enums for symbol literal unions (for example, method/attribute visibility is :public | :private)

This is not a breaking change in Ruby library -- the RBS::Location structure nor the union type attributes are not changed -- but introduces breaking changes in C level API.

Flat location attributes

The nested location attributes are expanded to flat struct attributes.

Instead of having one location attribute and sub-locations are stored under it, node structs directly have sub-locations are its attributes.

    // before: sub-locations are saved under location attribute.
    rbs_location_t *location;

    // after: The main location is saved as `location`, but sub-locations are saved to their own attributes.
    rbs_location _range location;
    rbs_location_range keyword_range;     /* Required */
    rbs_location_range name_range;        /* Required */
    rbs_location_range end_range;         /* Required */
    rbs_location_range type_params_range; /* Optional */
    rbs_location_range colon_range;       /* Optional */
    rbs_location_range self_types_range;  /* Optional */

The nested locations are then reconstructed in ast_translation.c.

        VALUE location = rbs_location_range_to_ruby_location(ctx, node->base.location);
        rbs_loc *loc = rbs_check_location(location);
        rbs_loc_legacy_alloc_children(loc, 5);
        rbs_loc_legacy_add_required_child(loc, rb_intern("keyword"), (rbs_loc_range) { .start = node->keyword_range.start, .end = node->keyword_range.end });
        rbs_loc_legacy_add_required_child(loc, rb_intern("name"), (rbs_loc_range) { .start = node->name_range.start, .end = node->name_range.end });
        rbs_loc_legacy_add_required_child(loc, rb_intern("end"), (rbs_loc_range) { .start = node->end_range.start, .end = node->end_range.end });
        rbs_loc_legacy_add_optional_child(loc, rb_intern("type_params"), (rbs_loc_range) { .start = node->type_params_range.start, .end = node->type_params_range.end });
        rbs_loc_legacy_add_optional_child(loc, rb_intern("lt"), (rbs_loc_range) { .start = node->lt_range.start, .end = node->lt_range.end });
        rb_hash_aset(h, ID2SYM(rb_intern("location")), location);

C enums for symbol literal unions

The union attributes have their own enum types.

enum rbs_attribute_visibility {
    RBS_ATTRIBUTE_VISIBILITY_UNSPECIFIED, /* unspecified (nil) */
    RBS_ATTRIBUTE_VISIBILITY_PUBLIC,      /* public (:public) */
    RBS_ATTRIBUTE_VISIBILITY_PRIVATE,     /* private (:private) */
};

The values are translated to Ruby symbols (and nil).

@soutaro soutaro added this to the RBS 4.0 milestone Jan 23, 2026
@soutaro soutaro force-pushed the flat-locations branch 2 times, most recently from 1f3e7c9 to 92d9009 Compare January 23, 2026 03:39
Copy link
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent! I don't have permission to officially "approve" on this repo, but I do approve :)

Left a few small comments/questions, but really happy with this!

Would you minding doing fixups on the various "fix"-style commits? Also, I think it would be useful if the Rust crate changes to add the new C types, was combined in the same commit that adds new types to C in the first place

typedef struct {
int start;
int end;
} rbs_loc_range;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered explicitly naming this "legacy"? It's a bit wordier, but much more clear versus just having a rbs_loc_range vs rbs_location_range distinction.

Suggested change
} rbs_loc_range;
} rbs_legacy_loc_range;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — I agree that making the distinction explicit would help.

I’m hesitant to use legacy, though, because I’m not planning to deprecate this in the Ruby codepath right now, and “legacy” tends to communicate “old + on the way out,” which isn’t what I mean here.

I’m open to a Ruby-specific rename, but I haven’t landed on a good name yet. (rbs_ruby_location_range???) In the meantime, would a brief comment in the file clarifying the intent/distinction be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good reasoning, and I agree.

a brief comment in the file clarifying the intent/distinction be sufficient?

Yep!

ALLOCATOR(),
full_loc,
colon_loc,
RBS_RANGE_LEX2AST(colon_range),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why RBS_RANGE_LEX2AST applies here (but didn't before?), but not to full_loc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason here. full_loc is declared as rbs_location_range, but colon_range is rbs_range_t.

Anyways, we need to fix this again. rbs_location_range is defined by character offsets, but Rust crate requires byte-offsets.

return EMPTY_ARRAY;
}

VALUE ruby_array = rb_ary_new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VALUE ruby_array = rb_ary_new();
VALUE ruby_array = rb_ary_new_capa((long) list->length);

(technically size_t can be larger than long, so we can add an assertion to compare against LONG_MAX, but really, the length should never be anywhere close to that many)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ooops, this change needs to be made on the templates/ext/rbs_extension/ast_translation.c.erb template, not the rendered ext/rbs_extension/ast_translation.c)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this micro-optimization before, but it didn't improve the parsing performance. So, keeping the current code would be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My eyes have just gone so attuned to finding unsized vector allocations :D They all stick out in bright red to me now lol

}
"rbs_ivar_name" => {
writeln!(file, " #[must_use]")?;
writeln!(file, " pub fn {}(&self) -> IvarName {{", field.name)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why do this from Rust code, rather than the usual ERB templates we use for C?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Personally I’m +1 on using the ERB templates for the Rust code as well, but I’m not fully sure—using build.rs would be more standard in the Rust community.

@soutaro
Copy link
Member Author

soutaro commented Jan 27, 2026

Would you minding doing fixups on the various "fix"-style commits? Also, I think it would be useful if the Rust crate changes to add the new C types, was combined in the same commit that adds new types to C in the first place

I’m happy to squash/fix up the commits that address review comments once everything looks good to reviewers. 👍

I’m not fully sure I can cleanly combine the Rust crate updates with the C type changes—those changes are a bit intertwined—but I’ll take a look and see what I can do.

@ParadoxV5 ParadoxV5 mentioned this pull request Jan 27, 2026
@soutaro soutaro force-pushed the flat-locations branch 6 times, most recently from 044b1b2 to 107663b Compare January 29, 2026 04:08
@soutaro soutaro marked this pull request as ready for review January 29, 2026 04:21
@soutaro soutaro enabled auto-merge January 30, 2026 13:17
@soutaro soutaro merged commit 4225f0c into master Jan 30, 2026
26 checks passed
@soutaro soutaro deleted the flat-locations branch January 30, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants