Limit console output in CharsetsTest to prevent OOM in resultsSum.pl#7000
Limit console output in CharsetsTest to prevent OOM in resultsSum.pl#7000smlambert merged 2 commits intoadoptium:masterfrom
Conversation
Without this fix, CharsetsTest prints all encoding/decoding errors to stdout, which gets captured into TestTargetResult and TAP files. With 33M+ errors on AIX ppc64, this causes resultsSum.pl to OOM when building the TAP file. - Cap console error output to MAX_CONSOLE_ERRORS_PER_CHARSET (10) per charset - Write all errors to charset_errors.txt for full debug details - Add verbose flag (-Dcharsets.verbose=true) for debug runs - prints all errors to console but should not be used in CI related: https://github.ibm.com/runtimes/automation/issues/921 Signed-off-by: Anna Babu Palathingal <anna.bp@ibm.com>
|
Tested:
|
There was a problem hiding this comment.
Pull request overview
Reduces stdout volume from CharsetsTest to prevent downstream TAP/result aggregation (resultsSum.pl) from running out of memory when extremely large numbers of charset mismatches occur (e.g., on AIX ppc64).
Changes:
- Caps per-charset console error output (unless
-Dcharsets.verbose=trueis enabled). - Writes mismatch details to
charset_errors.txtundertest.output.dir(or.by default). - Adds summary counters for tested/skipped charsets and total errors.
Show a summary per file
| File | Description |
|---|---|
| functional/MBCS_Tests/charsets/src/CharsetsTest.java | Limits console logging of mismatches, adds file logging + verbose mode, and prints a run summary to reduce TAP/result size. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
functional/MBCS_Tests/charsets/src/CharsetsTest.java:314
- The console message says further errors are "written to charset_errors.txt" even when errorFile could not be created (errorFile == null). Guard this message on errorFile != null or adjust the wording so it’s not misleading.
System.out.println(":"+sb0.toString()+"<>"+sb1.toString());
} else if (charset_err_cnt == MAX_CONSOLE_ERRORS_PER_CHARSET + 1) {
System.out.println(" ... further errors for " + ce.charset().name() + " written to charset_errors.txt");
}
- Files reviewed: 1/1 changed files
- Comments generated: 5
| static int tested_cnt = 0; | ||
| static int skipped_cnt = 0; | ||
| static boolean verbose = Boolean.getBoolean("charsets.verbose"); | ||
| static int MAX_CONSOLE_ERRORS_PER_CHARSET = 10; |
There was a problem hiding this comment.
MAX_CONSOLE_ERRORS_PER_CHARSET is a constant but is declared as a mutable static int. To match the existing constant style in this class (e.g., sbcs_space/dbcs_space/mix_space), make it static final (and ideally private) so it can’t be modified at runtime.
| static int MAX_CONSOLE_ERRORS_PER_CHARSET = 10; | |
| private static final int MAX_CONSOLE_ERRORS_PER_CHARSET = 10; |
| err_cnt++; | ||
| charset_err_cnt++; | ||
|
|
||
| // Always write to error file | ||
| if (errorFile != null) { | ||
| errorFile.print(cd.charset().name()+":"); | ||
| for(byte b : ba) errorFile.printf("\\x%02X",(int)b&0xFF); | ||
| errorFile.println(":"+sb0.toString()+"<>"+sb1.toString()); | ||
| } | ||
|
|
||
| // Print to console based on verbose flag or error limit | ||
| if (verbose || charset_err_cnt <= MAX_CONSOLE_ERRORS_PER_CHARSET) { | ||
| if (charset_err_cnt == 1) { | ||
| System.out.println("\n=== Decoding Errors in " + cd.charset().name() + " ==="); | ||
| } | ||
| System.out.print(" "); | ||
| for(byte b : ba) System.out.printf("\\x%02X",(int)b&0xFF); | ||
| System.out.println(":"+sb0.toString()+"<>"+sb1.toString()); | ||
| } else if (charset_err_cnt == MAX_CONSOLE_ERRORS_PER_CHARSET + 1) { | ||
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); |
There was a problem hiding this comment.
charset_err_cnt is shared between decoding and encoding. If a charset has decoding mismatches first, the later encoding mismatches won’t print the "=== Encoding Errors ===" header (and may be suppressed earlier than intended) because charset_err_cnt is already > 1. Consider tracking separate counters (or header flags) for decode vs encode, or resetting before the encoder phase.
| for(byte b : ba) System.out.printf("\\x%02X",(int)b&0xFF); | ||
| System.out.println(":"+sb0.toString()+"<>"+sb1.toString()); | ||
| } else if (charset_err_cnt == MAX_CONSOLE_ERRORS_PER_CHARSET + 1) { | ||
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); |
There was a problem hiding this comment.
The console message says further errors are "written to charset_errors.txt" even when errorFile could not be created (errorFile == null). Guard this message on errorFile != null or adjust the wording so it’s not misleading.
This issue also appears on line 311 of the same file.
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); | |
| if (errorFile != null) { | |
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); | |
| } else { | |
| System.out.println(" ... further errors for " + cd.charset().name() + " not shown"); | |
| } |
| System.out.println("Errors found: " + err_cnt); | ||
| if (err_cnt > 0) { | ||
| System.out.println("Full error details written to: charset_errors.txt"); | ||
| } |
There was a problem hiding this comment.
If charset_errors.txt cannot be created, the summary still prints "Full error details written to: charset_errors.txt". This should be conditional on errorFile being successfully opened (or print the earlier warning again / include the actual path).
| // Always write to error file | ||
| if (errorFile != null) { | ||
| errorFile.print(cd.charset().name()+":"); | ||
| for(byte b : ba) errorFile.printf("\\x%02X",(int)b&0xFF); | ||
| errorFile.println(":"+sb0.toString()+"<>"+sb1.toString()); | ||
| } |
There was a problem hiding this comment.
This writes every mismatch to charset_errors.txt unconditionally. With the reported 33M+ errors, this can create a multi-GB file and significantly slow the test or exhaust disk in CI. Consider capping file output by default (similar to console), and only enabling full file output under an explicit debug flag.
…rset -Dcharsets.verbose=true: Uncapped console output (local debugging only) -Dcharsets.debug.file=true: Uncapped file output, limited console (CI-safe debugging) - Update on review comments from copilot
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); | ||
| } else if (charset_decode_err_cnt == MAX_CONSOLE_ERRORS_PER_CHARSET + 1) { | ||
| if (errorFile != null) { | ||
| System.out.println(" ... further decoding errors for " + cd.charset().name() + " written to charset_errors.txt"); |
There was a problem hiding this comment.
Do we want a proper logger as opposed to s.o.p?
There was a problem hiding this comment.
As this is a test file, System.out.println is the approach used throughout the existing test suite. Used it this way toward keeping it consistent with the rest of the test file.
There was a problem hiding this comment.
We can look at other improvements to these tests in separate PRs.
|
TESTED RESULTS
|
| System.out.println(" ... further errors for " + cd.charset().name() + " written to charset_errors.txt"); | ||
| } else if (charset_decode_err_cnt == MAX_CONSOLE_ERRORS_PER_CHARSET + 1) { | ||
| if (errorFile != null) { | ||
| System.out.println(" ... further decoding errors for " + cd.charset().name() + " written to charset_errors.txt"); |
There was a problem hiding this comment.
We can look at other improvements to these tests in separate PRs.

Without this fix,
MBCS_Tests_charsets_0->CharsetsTestprints all encoding/decoding errors to stdout, which gets captured into TestTargetResult and TAP files. With 33M+ errors on AIX ppc64, this causesresultsSum.plto OOM when building the TAP file.charset_errors.txtfor full debug detailsEXTRA_OPTIONS=-Dcharsets.verbose=true`) for debug runs - prints all errors as default to console but should not be used in CI.related: https://github.ibm.com/runtimes/automation/issues/921