NUTCH-1732: allow deleting non-parsable documents by igiguere · Pull Request #891 · apache/nutch

igiguere · 2026-02-05T23:32:28Z

Add CrawlDatum status "parse_failed"/"db_parse_failed", set status in CrawlDbReducer.

Handle deletion in IndexerMapReduce.

Add unit tests.

Adjust documentation in nutch-default.xml: parser.delete.failed is configured deparately from delete.gone, so user can decide to delete or not the parse failures.

ant clean runtime test passed.

I'm not sure how to run a functional test... how do I make parsing fail the 2nd time a page is fetched?

Add CrawlDatum status "parse_failed"/"db_parse_failed", set status in CrawlDbReducer. Handle deletion in IndexerMapReduce. Add unit tests.

…ments

lewismc

Hi @igiguere I had a think about this over the weekend and will share my thoughts on two issues

Functional test

The new private CrawlTestParserFailure inner class always fails parsing. To test the realistic scenario of "succeed first, fail second", you could create/introduce a flip-flopping variant similar to how CrawlTestSignatureReset alternates betweenSTATUS_FETCH_SUCCESS and STATUS_FETCH_GONE. Something like

private class CrawlTestParserFailureAlternating extends ContinuousCrawlTestUtil {
  int counter = 0;

  CrawlTestParserFailureAlternating(Context context) {
    super(context);
  }

  @Override
  protected CrawlDatum fetch(CrawlDatum datum, long currentTime) {
    counter++;
    datum.setStatus(STATUS_FETCH_SUCCESS); // always fetched OK
    datum.setFetchTime(currentTime);
    return datum;
  }

  @Override
  protected List<CrawlDatum> parse(CrawlDatum fetchDatum) {
    List<CrawlDatum> parseDatums = new ArrayList<>(0);
    if (counter % 2 == 0) {
      // Even fetches: parsing fails
      parseDatums.add(new CrawlDatum(STATUS_PARSE_FAILED, 0));
    } else {
      // Odd fetches: parsing succeeds (emit signature)
      CrawlDatum sig = new CrawlDatum(STATUS_SIGNATURE, 0);
      sig.setSignature(getSignature());
      parseDatums.add(sig);
    }
    return parseDatums;
  }

  @Override
  protected boolean check(CrawlDatum result) {
    if (counter % 2 == 0) {
      return result.getStatus() == STATUS_DB_PARSE_FAILED;
    } else {
      return result.getStatus() == STATUS_DB_FETCHED
          || result.getStatus() == STATUS_DB_NOTMODIFIED;
    }
  }
}

is seems fairly practical and aligns with existing test patterns.

There is another solution! We already have CrawlDBTestUtil.getServer() for spinning up an embedded Jetty server. You could replace the static ResourceHandler with a custom handler that tracks request count per URL and serves different content on subsequent requests (similar principle as above and maybe more representative of real life fetching). This is also a good test... your code could be something like

import org.eclipse.jetty.server.Request;
import org.eclipse.jetty.server.handler.AbstractHandler;

public class FlipFlopHandler extends AbstractHandler {
  private final AtomicInteger requestCount = new AtomicInteger(0);

  @Override
  public void handle(String target, Request baseRequest,
      HttpServletRequest request, HttpServletResponse response)
      throws IOException, ServletException {
    int count = requestCount.incrementAndGet();
    response.setStatus(HttpServletResponse.SC_OK);
    if (count % 2 == 1) {
      // 1st fetch: valid HTML
      response.setContentType("text/html");
      response.getWriter().write("<html><head><title>Test</title></head>"
          + "<body>Hello World</body></html>");
    } else {
      // 2nd fetch: serve binary garbage with text/html MIME type
      // This will cause the HTML parser to fail
      response.setContentType("text/html");
      byte[] garbage = new byte[1024];
      new java.util.Random(42).nextBytes(garbage);
      response.getOutputStream().write(garbage);
    }
    baseRequest.setHandled(true);
  }
}

It would be good to implement some kind of test though. I agree.

Detailed Trace of `testDeleteParseFailure`

Design Intent

Your intent is that parser.delete.failed is configured separately from indexer.delete (which controls deletion of gone/redirect/duplicate documents), so the user can independently decide whether to delete parse failures. Does this mean the parse-failure deletion path in IndexerMapReduce should be gated on parser.delete.failed, not onindexer.delete?

If that is true, I think there are three issues in the current implementation and test.

Issue 1: The reducer guards parse-failure deletion on `indexer.delete`, not `parser.delete.failed`

The test correctly sets:

configuration.setBoolean(ParseSegment.DELETE_FAILED_PARSE, true);

However, the reducer's setup() method never reads parser.delete.failed. Instead, the parse-failure block at line 361 of IndexerMapReduce.java is guarded by the delete variable, which is wired to indexer.delete:

// in setup():
delete = conf.getBoolean(INDEXER_DELETE, false);  // "indexer.delete", default false

// in reduce():
// Whether to delete pages where parsing failed
if (delete && fetchDatum != null) {
  if (fetchDatum.getStatus() == CrawlDatum.STATUS_PARSE_FAILED
      || dbDatum != null && dbDatum.getStatus() == CrawlDatum.STATUS_DB_PARSE_FAILED) {
    deletedFailedParseCounter.increment(1);
    context.write(key, DELETE_ACTION);
    return;
  }
}

Since indexer.delete defaults to false and the test doesn't set it, the condition if (delete && ...) is always false. The parse-failure deletion code is never executed.

To match the design intent, setup() should read parser.delete.failed into its own boolean, and the guard should use that instead of delete:

// in setup():
deleteFailedParse = conf.getBoolean(ParseSegment.DELETE_FAILED_PARSE, false);

// in reduce():
if (deleteFailedParse && fetchDatum != null) {
  if (fetchDatum.getStatus() == CrawlDatum.STATUS_PARSE_FAILED
      || dbDatum != null && dbDatum.getStatus() == CrawlDatum.STATUS_DB_PARSE_FAILED) {
    deletedFailedParseCounter.increment(1);
    context.write(key, DELETE_ACTION);
    return;
  }
}

This way a user can set parser.delete.failed=true without needing indexer.delete=true, and conversely can have indexer.delete=true (for gone/redirects/duplicates) without automatically also deleting parse failures.

Issue 2: `STATUS_PARSE_FAILED` (0x45) would throw a `RuntimeException` in the classification loop

In the reducer's value classification loop:

} else if (value instanceof CrawlDatum) {
  final CrawlDatum datum = (CrawlDatum) value;
  if (CrawlDatum.hasDbStatus(datum)) {
    dbDatum = datum;
  } else if (CrawlDatum.hasFetchStatus(datum)) {
    if (datum.getStatus() != CrawlDatum.STATUS_FETCH_NOTMODIFIED) {
      fetchDatum = datum;
    }
  } else if (CrawlDatum.STATUS_LINKED == datum.getStatus()
      || CrawlDatum.STATUS_SIGNATURE == datum.getStatus()
      || CrawlDatum.STATUS_PARSE_META == datum.getStatus()) {
    continue;
  } else {
    throw new RuntimeException("Unexpected status: " + datum.getStatus());
  }

STATUS_PARSE_FAILED = 0x45. The range checks:

hasDbStatus(): status <= 0x1f — false (0x45 > 0x1f)
hasFetchStatus(): status > 0x1f && status <= 0x3f — false (0x45 > 0x3f)
Explicit matches: STATUS_LINKED (0x43), STATUS_SIGNATURE (0x41),
STATUS_PARSE_META (0x44) — none match 0x45
Falls to else — RuntimeException("Unexpected status: 69")

So if a CrawlDatum with STATUS_PARSE_FAILED were passed into the indexer reducer's values, it would crash. The test avoids this by passing crawlDatumFetchSuccess (STATUS_FETCH_SUCCESS = 0x21) and crawlDatumDbFetched (STATUS_DB_FETCHED = 0x02), but this means the fetchDatum.getStatus() == CrawlDatum.STATUS_PARSE_FAILED branch of the condition at line 362 is untestable without first adding STATUS_PARSE_FAILED to the classification logic (either as a recognized skip status or as a fetch-like status).

Issue 3: The test passes for the wrong reason

When failParse is true (i=0, 2, 4), the test asserts assertNull(doc). Here is what actually happens:

The reducer classifies values normally: crawlDatumDbFetched -> dbDatum,
crawlDatumFetchSuccess -> fetchDatum, failedParseData -> parseData,
parseText -> parseText, content -> content.
The parse-failure deletion block is skipped (Issue 1).

Execution reaches line 388:

if (fetchDatum == null || parseText == null || parseData == null) {
  return; // only have inlinks
}

All three are non-null, so execution continues.

Execution reaches line 405:
```
if (!parseData.getStatus().isSuccess()
    || fetchDatum.getStatus() != CrawlDatum.STATUS_FETCH_SUCCESS) {
  return;
}
```
When failedParseData is used (which has ParseStatus.STATUS_FAILURE),
parseData.getStatus().isSuccess() returns false, so !isSuccess() is true, and the
reducer returns early without writing anything to the context.
Since nothing is written to reduceResult, runIndexer returns null, and
assertNull(doc) passes.

Summary

The test passes, but for the wrong reason.

What the test intends to verify	What actually happens
`parser.delete.failed=true` triggers deletion of failed-parse docs	`parser.delete.failed` is never read by the reducer; the deletion block is gated on `indexer.delete` which defaults to `false`
The new `STATUS_PARSE_FAILED` / `STATUS_DB_PARSE_FAILED` check at lines 360–368 fires	That block is never entered because `delete == false`
`assertNull(doc)` confirms a DELETE action was emitted	`doc` is null because the pre-existing guard at line 405 silently returns when `ParseStatus` is not `isSuccess()` — no DELETE is emitted, the document is simply skipped

The document is not indexed on failParse iterations, but it is silently skipped, not explicitly deleted. These are very different outcomes: a silent skip means the document stays in the index from a previous successful crawl (which is the exact problem NUTCH-1732 is trying to solve).

What Needs to Be Fixed

The reducer must read parser.delete.failed in setup() and use it as an independent
guard for the parse-failure deletion block, instead of reusing the delete variable
(which is tied to indexer.delete). This is the core fix needed to match the design intent:

// in setup():
deleteFailedParse = conf.getBoolean(ParseSegment.DELETE_FAILED_PARSE, false);

// in reduce():
if (deleteFailedParse && fetchDatum != null) {
  if (fetchDatum.getStatus() == CrawlDatum.STATUS_PARSE_FAILED
      || dbDatum != null && dbDatum.getStatus() == CrawlDatum.STATUS_DB_PARSE_FAILED) {
    deletedFailedParseCounter.increment(1);
    context.write(key, DELETE_ACTION);
    return;
  }
}

STATUS_PARSE_FAILED (0x45) needs to be recognized in the CrawlDatum classification
loop (lines 318–333). Currently it would throw
RuntimeException("Unexpected status: 69"). It should either be added as a skip
(continue) alongside STATUS_LINKED / STATUS_SIGNATURE / STATUS_PARSE_META, or
handled in a way that allows the delete check to use it.
The test should pass a CrawlDatum with STATUS_PARSE_FAILED or
STATUS_DB_PARSE_FAILED instead of (or in addition to) relying on ParseData with
STATUS_FAILURE. The NUTCH-1732 logic at lines 360–368 checks the CrawlDatum status,
not the ParseData status.
The test should verify that a DELETE action was actually emitted, not just that doc is
null. Currently the runIndexer helper filters out DELETE actions (line 217:
if (e.getValue().action != NutchIndexAction.DELETE)) and returns null for them, which is
indistinguishable from a silent skip. A more robust assertion would check that
reduceResult contains a NutchIndexAction.DELETE entry.

Additional comments

I think the new configuration property should be changed from parser.delete.failed to parser.delete.failed.parse but this is minor in comparison to the above.
Total side observation (unrelated to your PR) I noticed that the indexer.delete configuration property is absent from nutch-default.xml... is this intentional?

igiguere · 2026-02-18T15:32:18Z

Hi @igiguere I had a think about this over the weekend and will share my thoughts on two issues
(...)

What Needs to Be Fixed

1. The reducer must read `parser.delete.failed` in `setup()` and use it as an independent
   guard for the parse-failure deletion block, instead of reusing the `delete` variable
   (which is tied to `indexer.delete`).

(...)

2. `STATUS_PARSE_FAILED` (0x45) needs to be recognized in the `CrawlDatum` classification
   loop (lines 318–333).

(...)

3. The test should pass a `CrawlDatum` with `STATUS_PARSE_FAILED` or
   `STATUS_DB_PARSE_FAILED` instead of (or in addition to) relying on `ParseData` with
   `STATUS_FAILURE`.

(...)

4. The test should verify that a DELETE action was actually emitted, not just that `doc` is
   null.

(...)

Additional comments

1. I think the new configuration property should be changed from `parser.delete.failed` to `parser.delete.failed.parse` but this is minor in comparison to the above.

2. Total side observation (unrelated to your PR) I noticed that the `indexer.delete` configuration property is absent from `nutch-default.xml`... is this intentional?

Thanks for the review, @lewismc
And sorry.
Really. I feel like an idiot. This PR should not have been published in it's current state.
I have to think back 10 years to remember any review on this scale. I don't know what happened. I should have seen at least 90% of all that myself.
I'm fixing this. I hope my brain activates this time.
Thanks for the tip about CrawlDBTestUtil.getServer(). I'll use that.

Note: the indexer.delete configuration property pre-dates this PR. So that, at least, is not on me. I'll add it to nutch-default

…ments

lewismc · 2026-02-18T18:55:21Z

This was a really good refresher for me as well. It touches parts of the codebase I hadn't looked at for a while... but I must admit, it took me a while. I was scratching my head in the debugger for ages.
@igiguere also, apologies my comment above was a bit short. No need to apologize at all. Solid peer review is the beauty of open source and definitely the part thats had the most impact on me personally. Please ping me if and when you want more peer review. Thank you.

Fix IndexerMapReduce. Improve test for failed parsing in TestCrawlDbStates and TestIndexerMapReduce. TestFetchWithPArseFailure "should" test the fetcher, but it strangely does not fully run most of the time. Disabled while thinking of a solution.

…ments

igiguere · 2026-02-24T21:50:23Z

This was a really good refresher for me as well. It touches parts of the codebase I hadn't looked at for a while... but I must admit, it took me a while. I was scratching my head in the debugger for ages. @igiguere also, apologies my comment above was a bit short. No need to apologize at all. Solid peer review is the beauty of open source and definitely the part thats had the most impact on me personally. Please ping me if and when you want more peer review. Thank you.

@lewismc

I fixed IndexerMapReduce, so it is now properly configured.

I improved test for failed parsing in TestCrawlDbStates and TestIndexerMapReduce.

The new test TestFetchWithParseFailure "should" test the fetcher, but it strangely does not fully run most of the time. It's disabled while thinking of a solution. Feel free to pitch in with ideas.

UPDATE: Silly me. The full log is logs/hadoop.log, not the output file under build/test/... That file looks like a half-run test. The content of logs/hadoop.log show the full test run.

UPDATE 2 : The default HTML parser (Neko) is a lot more tolerant than expected. Even random bytes (weird enough to prevent opening the log file if output) do not produce a parsing error.

UPDATE 3 : Also tried the "Tagsoup" HTML parser, but that one does not produce a failed parse either. I also tried to see if I could extend HtmlParser, but plugin classes are not visible on the unit test classpath, so it does not even compile. Otherwise, there's no hook to easily insert a fake or mock plugin to override the behavior of HtmlParser.
I'm thinking the fetcher part is just not testable for this feature.

UPDATE 4: I found a way to test! Just set mime.type.magic=false, and let the ParserFactory throw an exception. At least it tests the fetcher logic with property parser.delete.failed.parse=true. However, the case where the HTML parser returns a non-null ParseResult with failed status is not tested. Ideally, I would have liked to inject a mock, if not a mock parser, as in my previous update, at least a mock ParseUtil. But that is constructed in FetcherThread.

Change the type of ExecutorService, to no avail. Fix test conf to have just 1 fetcher thread.

…ments

Investigate the faling test, fix the disabled message

Replace the string content with random bytes injected from a resource handler. No change. Test still fails.

Tried "tagsoup", but parse result is success, just like "neko".

Fix unit test: set "mime.type.magic"=false to force an exception from ParserFactory. Log a message in ParseSegment when skipping un-parseable content.

lewismc · 2026-03-21T00:48:43Z

Hi @igiguere I'll look into this shortly. Thank you.

….com:igiguere/nutch.git into NUTCH-1732-delete-not-parsable-documents

Merge remote-tracking branch 'origin-nutch/master' into NUTCH-1732-delete-not-parsable-documents

lewismc · 2026-04-21T22:07:34Z

Hi @igiguere I saw you updated this PR, is it ready to be reviewed?

…ments

igiguere · 2026-05-09T18:28:06Z

Hi @igiguere I saw you updated this PR, is it ready to be reviewed?

@lewismc : Yes, it's ready for review. I think I addressed all concerns.

lewismc · 2026-05-14T03:58:48Z

Hi @igiguere it looks like Yetus failed the CI

conf/nutch-default.xml:1811:  <description>Boolean value for whether we should delete a page from the index when parsing the page fails. 
conf/nutch-default.xml:1812:  By default this property is deactivated, because it will delete an existing page from the index, where a 
src/java/org/apache/nutch/crawl/CrawlDbReducer.java:133:      
src/java/org/apache/nutch/crawl/CrawlDbReducer.java:330:      
src/java/org/apache/nutch/parse/ParseSegment.java:132:        return;        
src/java/org/apache/nutch/parse/ParseStatus.java:191:  
src/test/org/apache/nutch/crawl/CrawlDBTestUtil.java:503:  
src/test/org/apache/nutch/crawl/CrawlDBTestUtil.java:506:   * 
src/test/org/apache/nutch/crawl/CrawlDBTestUtil.java:511:   * @param resourceHandler 
src/test/org/apache/nutch/crawl/TestCrawlDbStates.java:374:  
src/test/org/apache/nutch/crawl/TestCrawlDbStates.java:788:  
src/test/org/apache/nutch/crawl/TestCrawlDbStates.java:797:    
src/test/org/apache/nutch/crawl/TestCrawlDbStates.java:810:        LOG.info("expect fetch success");  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:52:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:54:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:57:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:64:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:67:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:70:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:71:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:75:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:79:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:89:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:112:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:117:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:132:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:153:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:156:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:176:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:178:    urls.forEach(i -> 
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:196:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:231:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:240:  
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:242:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:246:    
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:248:    public void handle(String target, Request baseRequest, HttpServletRequest request, HttpServletResponse response) 
src/test/org/apache/nutch/fetcher/TestFetchWithParseFailures.java:258:        
src/test/org/apache/nutch/indexer/TestIndexerMapReduce.java:109:    
src/test/org/apache/nutch/indexer/TestIndexerMapReduce.java:127:

sebastian-nagel · 2026-05-15T14:41:01Z

Yetus failed the CI

It seems to be all because of trailing white space.

Also: the file TestFetchWithParseFailures.java lacks the Apache license header.

Apart from the two trivial code formatting points, my +1.

NUTCH-1732: allow deleting non-parsable documents

dedd831

Add CrawlDatum status "parse_failed"/"db_parse_failed", set status in CrawlDbReducer. Handle deletion in IndexerMapReduce. Add unit tests.

lewismc mentioned this pull request Feb 8, 2026

NUTCH-3150 Expand Caching Hadoop Counter References #892

Merged

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

f64cbcf

…ments

igiguere marked this pull request as ready for review February 11, 2026 21:41

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

c9e2906

…ments

lewismc requested changes Feb 17, 2026

View reviewed changes

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

8d28b87

…ments

Isabelle Giguere and others added 2 commits February 24, 2026 16:35

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

ca0a99a

…ments

Isabelle Giguere and others added 6 commits February 24, 2026 17:27

NUTCH-1732: allow deleting non-parseable documents

9cb56a7

Change the type of ExecutorService, to no avail. Fix test conf to have just 1 fetcher thread.

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

a7b815e

…ments

NUTCH-1732: allow deleting non-parseable docs

60882c8

Investigate the faling test, fix the disabled message

NUTCH-1732: add resource handler

38f6152

Replace the string content with random bytes injected from a resource handler. No change. Test still fails.

NUTCH-1732: unit test still disabled

0a26d3e

Tried "tagsoup", but parse result is success, just like "neko".

NUTCH-1732: allow deleting un-parseable docs

81597d4

Fix unit test: set "mime.type.magic"=false to force an exception from ParserFactory. Log a message in ParseSegment when skipping un-parseable content.

Isabelle Giguere added 2 commits April 20, 2026 09:45

Merge branch 'NUTCH-1732-delete-not-parsable-documents' of git@github…

3974007

….com:igiguere/nutch.git into NUTCH-1732-delete-not-parsable-documents

NUTCH-1732: fix merge conflict

70e11ef

Merge remote-tracking branch 'origin-nutch/master' into NUTCH-1732-delete-not-parsable-documents

Merge branch 'apache:master' into NUTCH-1732-delete-not-parsable-docu…

4585236

…ments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUTCH-1732: allow deleting non-parsable documents#891

NUTCH-1732: allow deleting non-parsable documents#891
igiguere wants to merge 15 commits into
apache:masterfrom
igiguere:NUTCH-1732-delete-not-parsable-documents

igiguere commented Feb 5, 2026 •

edited by lewismc

Loading

Uh oh!

lewismc left a comment •

edited

Loading

Uh oh!

igiguere commented Feb 18, 2026 •

edited

Loading

What Needs to Be Fixed

Additional comments

Uh oh!

lewismc commented Feb 18, 2026 •

edited

Loading

Uh oh!

igiguere commented Feb 24, 2026 •

edited

Loading

Uh oh!

lewismc commented Mar 21, 2026

Uh oh!

lewismc commented Apr 21, 2026

Uh oh!

igiguere commented May 9, 2026

Uh oh!

lewismc commented May 14, 2026

Uh oh!

sebastian-nagel commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

igiguere commented Feb 5, 2026 • edited by lewismc Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewismc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Functional test

Detailed Trace of testDeleteParseFailure

Design Intent

Issue 1: The reducer guards parse-failure deletion on indexer.delete, not parser.delete.failed

Issue 2: STATUS_PARSE_FAILED (0x45) would throw a RuntimeException in the classification loop

Issue 3: The test passes for the wrong reason

Summary

What Needs to Be Fixed

Additional comments

Uh oh!

igiguere commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Needs to Be Fixed

Additional comments

Uh oh!

lewismc commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igiguere commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewismc commented Mar 21, 2026

Uh oh!

lewismc commented Apr 21, 2026

Uh oh!

igiguere commented May 9, 2026

Uh oh!

lewismc commented May 14, 2026

Uh oh!

sebastian-nagel commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

igiguere commented Feb 5, 2026 •

edited by lewismc

Loading

lewismc left a comment •

edited

Loading

Detailed Trace of `testDeleteParseFailure`

Issue 1: The reducer guards parse-failure deletion on `indexer.delete`, not `parser.delete.failed`

Issue 2: `STATUS_PARSE_FAILED` (0x45) would throw a `RuntimeException` in the classification loop

igiguere commented Feb 18, 2026 •

edited

Loading

lewismc commented Feb 18, 2026 •

edited

Loading

igiguere commented Feb 24, 2026 •

edited

Loading