Skip to content

fix: Hash::MultiValue + blib/arch — splice spill, deref slice, dclone hooks#478

Open
fglock wants to merge 7 commits intomasterfrom
fix/hash-multivalue-dclone-splice
Open

fix: Hash::MultiValue + blib/arch — splice spill, deref slice, dclone hooks#478
fglock wants to merge 7 commits intomasterfrom
fix/hash-multivalue-dclone-splice

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 9, 2026

Summary

Fixes all ./jcpan -t Hash::MultiValue test failures (was 7/10 programs passing, now 10/10), the ./jcpan -t HTTP::Thin blib/arch issue, regex error-handling bugs under JPERL_UNIMPLEMENTED=warn, regex preprocessing issues, and a missing compile-time check for global-only variables in my/state declarations.

Bug 1: JVM backend — splice with constant sub causes ASM frame crash

  • handleSpliceBuiltin left the first arg on the JVM operand stack while evaluating remaining args
  • When those contained a function call (e.g. constant sub C from use constant), the blockDispatcher GOTOs created inconsistent stack depths at merge points, crashing ASM Frame.merge
  • Fix: Register spilling in EmitOperator.java, matching the existing handlePushOperator pattern

Bug 2: Interpreter backend — @$ref[@idx] = ... unsupported

  • The array slice assignment handler in CompileAssignment.java only supported @array[@idx] with plain IdentifierNode
  • Fix: Added a branch to handle dereferenced array refs using DEREF_ARRAY/DEREF_ARRAY_NONSTRICT opcodes

Bug 3: Storable::dclone — shared refs from STORABLE_freeze hooks

  • dclone passed extra refs from STORABLE_freeze directly to STORABLE_thaw without cloning them
  • Inside-out objects like Hash::MultiValue ended up sharing internal arrays between original and clone
  • Fix: Deep-clone extra refs (indices 1+) before passing to STORABLE_thaw in Storable.java

Bug 4: blib/arch missing in generated Makefile

  • ExtUtils::MakeMaker generated a Makefile whose pure_all target never created blib/arch/
  • blib.pm requires it and dies if missing
  • Fix: Added @mkdir -p blib/arch to the pure_all target in ExtUtils/MakeMaker.pm

Bug 5: JPERL_UNIMPLEMENTED=warn downgrading real regex errors

  • ALL regex compilation exceptions were downgraded to warnings, not just unimplemented features
  • This caused real errors (e.g. \N{}) to become warnings, allowing execution to continue into infinite loops
  • Fix: Only downgrade PerlJavaUnimplementedException, not all exceptions, in RuntimeRegex.java
  • Also restructured catch to properly distinguish PerlJavaUnimplementedException (extends PerlCompilerException) from real syntax errors, and wrap Java PatternSyntaxException as downgradable

Bug 6: my $0 / my @_ / my %_ not rejected at compile time

  • Perl requires that forced-global variables cannot be lexicalized with my or state
  • jperl silently allowed this, which could cause infinite loops
  • Fix: Added isGlobalOnlyVariable() check in OperatorParser.addVariableToScope() that emits Can't use global $X in "my" (matching Perl's error message)
  • Covers: underscore vars, digit-only names, single ASCII punctuation names, caret/control vars
  • Unicode single-char variables (>= 128) are correctly allowed

Bug 7: Regex preprocessor — unimplemented features using wrong exception type

  • regexError() (throws PerlCompilerException) was used for unimplemented features like control verbs, (?@...), and lookbehind >255
  • These should use regexUnimplemented() (throws PerlJavaUnimplementedException) so they can be downgraded with JPERL_UNIMPLEMENTED=warn
  • Fix: Changed these calls to use regexUnimplemented() in RegexPreprocessor.java

Bug 8: handleCodeBlock consuming closing paren meant for handleParentheses

  • handleCodeBlock consumed both } and ) of (?{...}), but handleParentheses expects to consume the closing ) itself
  • This caused "Unmatched (" errors for (?{...}) code blocks in single-quoted regexes
  • Fix: handleCodeBlock now returns offset pointing TO ), letting handleParentheses consume it

Bug 9: handleQuantifier consuming regex metacharacters in brace expressions

  • handleQuantifier() used indexOf('}') to find the closing brace, crossing character class and group boundaries
  • Pattern like { (?> [^{}]+ | (??{...}) )* } had the (?>... consumed as literal text
  • Fix: When braces don't form a valid quantifier, only escape the opening { and return immediately

Bug 10: \x{...} hex escape with non-hex characters crashes

  • Integer.parseInt(hexStr, 16) threw fatal NumberFormatException for \x{9bq}
  • Fix: Extract valid hex prefix (Perl behavior: stop at first non-hex char). Also handle bare \xNN with non-hex chars by parsing up to 2 hex digits.
  • Improved pat_advanced.t from 63 to 731 passing tests

Bug 11: NullPointerException when regex fails with JPERL_UNIMPLEMENTED=warn

  • regex.patternString was never set in the catch block, causing NPE downstream
  • Fix: Set regex.patternString in catch block, add null guard in preProcessRegex()

Test plan

  • make clean ; make passes (all unit tests)
  • ./jcpan -t Hash::MultiValue — 10/10 test programs, 55/55 subtests PASS
  • ./jcpan -t HTTP::Thin — blib/arch created correctly
  • op/local.t — no regression vs master (142/170 ok on both)
  • my $0, my @_, my %_, my $!, my $/, my $^W, state $_ all correctly rejected
  • our $_, local $_, my $a, my %ENV all correctly allowed
  • uni/variables.t — 66880/66880 (matches master)
  • re/pat_rt_report.t — 2397/2515 (matches master, now runs 2470 before crash)
  • re/pat.t533/1298 (improved from 428)
  • re/pat_advanced.t731/838 (improved from 63)
  • re/reg_eval_scope.t — 7/49 (improved from 0, master: 6)

Generated with Devin

fglock and others added 7 commits April 9, 2026 23:12
…ne hooks

Three bugs fixed:

1. JVM backend: splice with constant sub causes ASM frame crash
   handleSpliceBuiltin left the first arg on the JVM operand stack
   while evaluating remaining args. When those contained a function
   call (constant sub), the blockDispatcher's GOTOs created
   inconsistent stack depths at merge points. Fixed with register
   spilling, matching handlePushOperator's existing pattern.

2. Interpreter backend: @$ref[@idx] = ... unsupported
   The array slice assignment handler only supported @array[@idx]
   with plain IdentifierNode. Added a branch to handle dereferenced
   array refs using DEREF_ARRAY/DEREF_ARRAY_NONSTRICT opcodes.

3. Storable::dclone: shared refs from STORABLE_freeze hooks
   dclone passed extra refs from STORABLE_freeze directly to
   STORABLE_thaw without cloning them. Inside-out objects like
   Hash::MultiValue ended up sharing internal arrays between
   original and clone. Fixed by deep-cloning extra refs (indices 1+).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The generated Makefile pure_all target only created blib/lib/
but not blib/arch/. The blib.pm pragma requires both directories
to exist (-d blib && -d blib/lib && -d blib/arch), so use blib
and -Mblib would die with Cannot find blib even in ...

This caused CPAN module test suites (e.g. HTTP::Thin) to fail when
they use open3 with -Mblib to verify modules load cleanly.

Add mkdir -p blib/arch to the pure_all target so the directory
always exists alongside blib/lib.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…tures

The regex error catch block was downgrading ALL exceptions to warnings
when JPERL_UNIMPLEMENTED=warn was set, including real compilation errors
like "Invalid Unicode character name". Now only PerlJavaUnimplementedException
is downgraded to a warning; other regex errors remain fatal.

This fixes a hang in lib/croak.t where `qr/(?{})\N{}/;while(my($0)=0){}`
would continue past the \N{} error into an infinite while loop when
JPERL_UNIMPLEMENTED=warn was set by the test runner.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl requires that forced-global variables ($_, @_, %_, $0, $1, $!,
$/, $^W, etc.) cannot be lexicalized with 'my' or 'state'. Previously
jperl silently allowed this, which could cause infinite loops when
my($0) returned truthy in a while condition.

Now emits: Can't use global $X in "my" (matching Perl's error message.

The check covers:
- Underscore variables: src/main/java/org/perlonjava/frontend/parser/OperatorParser.java, @_, %_
- Digit-only names: bash, , , ...
- Single punctuation names: , $/, , $;, $,, $., $|, etc.
- Caret/control variables: $^W, $^H, $^O, etc.

'our' and 'local' are unaffected (they correctly alias/localize globals).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
EOF
)
- RegexPreprocessor: use regexUnimplemented() (not regexError()) for
  control verbs, (?@...), and lookbehind >255 so JPERL_UNIMPLEMENTED=warn
  can downgrade them (fixes pat_rt_report.t: 196 -> 2397)

- RuntimeRegex: restructure catch block to distinguish
  PerlJavaUnimplementedException (extends PerlCompilerException) from
  real PerlCompilerException syntax errors. Wrap Java
  PatternSyntaxException as unimplemented so it can be downgraded.
  (fixes pat_advanced.t: 54 -> 63)

- RegexPreprocessor.handleCodeBlock: don't consume closing paren - let
  handleParentheses consume it, matching the protocol used by all other
  group handlers. Fixes code blocks causing "Unmatched (" errors.
  (fixes pat.t: 239 -> 428)

- OperatorParser.isGlobalOnlyVariable: restrict single-char punctuation
  check to ASCII (< 128) so Unicode characters that Java doesn't
  recognize as letters aren't rejected.
  (fixes uni/variables.t: 66803 -> 66880)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Fix handleQuantifier consuming regex metacharacters inside invalid
  brace expressions (e.g., {(?>...)*} was consumed as a single literal
  brace expression). Now only escapes the opening { and lets the main
  loop process subsequent characters.

- Fix \x{...} hex escapes with non-hex characters to extract valid
  hex prefix like Perl does (e.g., \x{9bq} -> 0x9B). Fixes fatal
  crash in pat_advanced.t at line 321.

- Handle bare \xNN with non-hex chars (e.g., \xk -> \x00 + literal k)
  instead of passing through to Java Pattern which rejects it.

- Fix NullPointerException when regex with (?{...}) code blocks fails
  with JPERL_UNIMPLEMENTED=warn: set regex.patternString in catch
  block and add null guard in preProcessRegex.

Test improvements:
  pat_advanced.t: 63 -> 731 passing (+668)
  pat.t: 428 -> 533 passing (+105)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Analyzed pat.t (99 failures + 666 blocked), pat_advanced.t (107 failures),
and pat_rt_report.t (77 failures) into 16 categories (A-P) with difficulty
ratings and priority recommendations.

Key findings:
- \p{isAlpha} alias crash blocks 666 pat.t tests (quick fix)
- Bug 41010 conditional+$ anchor accounts for 48 pat_rt_report.t failures
- $^N not updated = 20 pat_advanced.t failures
- \N{name} charnames = 25 pat_advanced.t failures
- (?{...}) code blocks = 46 failures (very hard, engine limitation)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant