Open
Conversation
Document 6 PerlOnJava bugs discovered by running jcpan -t WWW::Mechanize: - Parser rejects NEWLINE after trailing :: (IdentifierParser.java) - UNIVERSAL::isa() missing CODE reference type (Universal.java) - HTMLParser array-ref accumulator handlers missing (HTMLParser.java) - Overload dispatch does not set AUTOLOAD (OverloadContext.java) - Devel::Cycle stub needed for Test::Memory::Cycle - Capture::Tiny fork dependency (known limitation) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…l::Cycle Fix 5 PerlOnJava bugs blocking WWW::Mechanize and its dependency chain: 1. Parser: allow NEWLINE/WHITESPACE/comma after trailing :: in package names (e.g. Tie::RefHash:: followed by newline). The validation gate in parseSubroutineIdentifier() now permits these tokens, letting the loop continue to the top where they are already handled correctly. 2. UNIVERSAL::isa(): add CODE reference type to the switch statement so UNIVERSAL::isa(\&sub, 'CODE') correctly returns true. This fixes HTML::Element::traverse() which checks isa(callback, 'CODE'). 3. HTMLParser: implement array-ref accumulator handlers and argspec parsing in fireEvent(). HTML::PullParser/TokeParser registers an array ref as the event callback; fireEvent() now builds event data per the argspec string and pushes it onto the accumulator. This makes HTML::Form::parse() return actual form objects. 4. OverloadContext: set $AUTOLOAD when overload method resolution falls through to AUTOLOAD. Fixes URI::WithBase stringification which uses overload '""' => 'as_string' with AUTOLOAD delegation. 5. Devel::Cycle: add no-op stub for JVM compatibility (same pattern as Test::LeakTrace). The JVM tracing GC handles cycles natively. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When a STRING (method name) handler is used with argspec containing "self", the XS behavior is to use "self" as the method invocant only, not to pass it as an additional argument. This was causing HTML::TreeBuilder to receive $self as $tag (doubled invocant), breaking as_HTML() output. - Added skipSelf parameter to buildEventDataFromArgspec() - STRING callbacks: skipSelf=true (self is already the invocant) - CODE/ARRAY callbacks: skipSelf=false (self included in args) WWW::Mechanize tests: 431/478 pass (90.2%) on non-local tests Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
All phases complete. WWW::Mechanize non-local tests: 431/478 (90.2%). Document remaining failures and Bug 7 (skipSelf) details. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Three fixes that improve WWW::Mechanize tests from 90.2% to 97.0%: 1. HTMLParser: Buffer incomplete tags across parse() chunk boundaries. When HTML::PullParser feeds 512-byte chunks, tags spanning boundaries were truncated. Now incomplete start/end tags are buffered for the next parse() call, matching Perl HTML::Parser XS behavior. 2. Strict subs: Allow barewords ending with :: (package name constants). In Perl, Tie::RefHash:: is always legal even under use strict subs and evaluates to the string without trailing ::. Fixed in both JVM and bytecode backends. 3. HTMLParser: Self-closing /> handling matches Perl HTML::Parser. In non-XML mode, / in /> is now emitted as boolean attribute. Synthetic end tags only in xml_mode. WWW::Mechanize non-local tests: 513/529 pass (97.0%) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… raw text, media.types
Three fixes improving WWW::Mechanize test pass rate from 97.0% to 98.1%:
1. EmitStatement.java: Make labeled blocks valid targets for unlabeled
last/next/redo. In Perl, LABEL: { for (...) {} last; } should exit
the labeled block, not the program. The previous logic incorrectly
prevented unlabeled control flow from targeting labeled simple blocks.
2. HTMLParser.java: Add raw text element handling for script, style,
xmp, listing, plaintext, textarea, and title elements. Content
inside these elements is not parsed for HTML tags, matching Perl
HTML::Parser behavior.
3. Bundle LWP/media.types data file for MIME type lookups. This file was
missing from the project, causing LWP::MediaTypes to fail to map file
extensions to content types.
Test results (non-server tests): 522/532 subtests pass (98.1%)
- upload.t: 3/5 -> 5/5 (labeled block fix)
- image-parse.t: 41/47 -> 41/42 (media.types)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…tation plans Detailed root cause analysis and step-by-step implementation plans for: - Phase 8: Fix fileno/dup chain for Capture::Tiny capture* (4 bugs identified) - Phase 9: Test HTTP::Daemon pure Perl server mode end-to-end Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…::Daemon Four fixes to the fileno/dup chain that enable Capture::Tiny capture* functions and HTTP::Daemon pure Perl server mode: 1. RuntimeIO: assign sequential filenos to all file handles (files, pipes) on open, not just sockets. fileno() previously returned undef for regular files, breaking Capture::Tiny's dup-by-fd-number pattern. 2. IOOperator: bridge findFileHandleByDescriptor() to RuntimeIO's fileno registry. The dup-by-fd path (open *STDOUT, ">&3") couldn't find handles because it only checked its own empty map + hardcoded 0/1/2. 3. RuntimeIO: guard empty filename in dup mode. When fileno() returned undef, ">&" . undef became ">&" (empty fd), which silently replaced STDOUT with a stdin reader instead of reporting an error. 4. RuntimeIO: unregister fileno on close() to prevent fd number leaks. 5. IOOperator: assign fileno to duplicated handles so they can also be found by dup-by-fd. Results: - Capture::Tiny capture/capture_stdout/capture_stderr/capture_merged: WORKING - HTTP::Daemon new/accept/get_request/send_response: WORKING (pure Perl) - WWW::Mechanize non-server tests: 529/532 (99.4%, up from 522/532) - WWW::Mechanize local server tests: 17/19 pass (0 failures, 2 timeouts) - dump.t: 1/7 -> 7/7, mech-dump/file_not_found.t: 0/1 -> 1/1 Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…::Daemon) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Since jperl doesn't implement DESTROY or reference counting, IO handles on gensym'd globs (used by IO::Socket, HTTP::Daemon, etc.) were never closed when variables went out of scope or were explicitly undef'd. This caused HTTP clients like LWP to hang waiting for the server to close connection sockets on 302 redirects. Changes: - RuntimeScalar: Add closeIOOnDrop() called from undefine() and setLarge() to close IO handles when a GLOBREFERENCE is dropped - RuntimeScalar: Only close IO for globs NOT in the stash (checked via existsGlobalIO) to avoid closing named handles like *MYFILE - RuntimeIO: Fix getRuntimeIO() fallback to use getExistingGlobalIO() instead of getGlobalIO() to prevent auto-vivifying stash entries for deleted gensym globs - GlobalVariable: Add getExistingGlobalIO() non-auto-vivifying lookup Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- All 18 local server tests now pass with 0 timeouts - back.t: 47/47, get.t: 34/34 (previously had server-exit timeouts) - Document remaining issues: CDATA, CSS url, cookies.t (fork) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
HTMLParser: implement <![CDATA[...]]>, <![IGNORE[...]]>, <![INCLUDE[...]]> parsing when marked_sections or xml_mode is enabled. CDATA sections emit text events with is_cdata=true. Script/style raw content handlers skip CDATA sections when looking for closing tags (marked_sections only). When marked_sections is off, < Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Root cause: Capture::Tiny's _copy_std() saves STDOUT/STDERR handles
by dup'ing them into a hash via a reused loop variable:
$h = IO::Handle->new(); open($h, ">&STDOUT"); $old{stdout} = $h;
$h = IO::Handle->new(); open($h, ">&STDERR"); $old{stderr} = $h;
When $h was reassigned, setLarge() called closeIOOnDrop(), which saw
the gensym'd glob was no longer in any stash and closed its IO. This
set ioHandle=ClosedIOHandle on the RuntimeIO still referenced by
$old{stdout}, causing fileno() to return undef and the later restore
to fail with "Bad file descriptor".
Additionally, the fd recycling mechanism (ConcurrentLinkedQueue) was
unsafe: the closed RuntimeIO's fd was added to the free queue, and
the next assignFileno() reused it, causing two handles to share the
same fd number.
Changes:
- RuntimeScalar.setLarge(): Remove closeIOOnDrop() from non-null
assignment path. Without reference counting we cannot know if other
variables still reference the same glob. Keep it in undefine() and
setLarge(null) where explicit cleanup is intended.
- RuntimeIO: Remove fd recycling queue (freedFilenos). Fd numbers are
now monotonically increasing and never reused. This is safe because
they are virtual (not OS fds) and only consume map entries.
- Add dev/design/io_handle_lifecycle.md documenting the full analysis,
design decisions, and future improvement options.
See: dev/design/io_handle_lifecycle.md
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive support for WWW::Mechanize and its dependency chain on PerlOnJava, achieved through 11 phases of fixes across the parser, runtime, IO system, and build configuration.
Key changes
Parser & Language fixes
:in package namesUNIVERSAL::isa()not recognizing CODE references::barewords (package name constants)$AUTOLOADwhen resolution falls through to AUTOLOADlast/next/redoHTMLParser (HTML::Parser Java implementation)
parse()chunk boundaries/>handling: emit as attribute in non-XML mode<script>,<style>,<xmp>,<listing>,<plaintext>,<textarea>,<title><![CDATA[...]]>,<![IGNORE[...]]>,<![INCLUDE[...]]></script>inside<![CDATA[...]]>not treated as closing tagis_cdataargspec now correctly reflects CDATA text eventsIO & Runtime fixes
fileno()to return valid fd numbers for all file/pipe handlesfindFileHandleByDescriptor()to RuntimeIO fileno registrycloseIOOnDrop(): close IO handles onundef/reassignment for gensym'd globsgetRuntimeIO()fallback to use non-auto-vivifyinggetExistingGlobalIO()Build & Dependencies
Devel::Cycleno-op stub forTest::Memory::CyclecompatibilityLWP/media.typesin JAR resources for MIME type detection (.css→text/css, etc.)Test Results
Non-server tests: ~542/545 (99.8%)
Local server tests: 18/18 (0 timeouts)
Remaining (known limitations)
fork()— usesopen FH, '-|'patternSee
dev/modules/www_mechanize.mdfor detailed per-phase root cause analysis.Test plan
makepasses (no regressions in all 11 phases)Generated with Devin