Skip to content

Conversation

@dlmarion
Copy link
Contributor

In the MultiAccessEvaluatorImpl case it was calling AccessEvaluator.canAccess(byte[]) for each AccessEvaluator. canAccess(byte[]) would re-parse the AccessExpression each time. This change enables the MultiAccessEvaluatorImpl to get the ParsedAccessExpression and call the new method on each AccessEvaluator. The new method traverses the parse tree for evalation so that parsing is not performed multiple times. The downside is that there are two different methods in use, but the tests were already checking this and still pass.

Closes #78

In the MultiAccessEvaluatorImpl case it was calling
AccessEvaluator.canAccess(byte[]) for each AccessEvaluator.
canAccess(byte[]) would re-parse the AccessExpression each
time. This change enables the MultiAccessEvaluatorImpl
to get the ParsedAccessExpression and call the new
method on each AccessEvaluator. The new method traverses
the parse tree for evalation so that parsing is not
performed multiple times. The downside is that there are
two different methods in use, but the tests were already
checking this and still pass.

Closes apache#78
@dlmarion dlmarion added this to the 1.0.0 milestone Dec 19, 2025
@dlmarion dlmarion self-assigned this Dec 19, 2025

@Override
public boolean canAccess(AccessExpression accessExpression) {
return canAccess(accessExpression.parse());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done any testing to confirm this is faster? The code that does not create the parse tree was a lot faster when benchmarking, but not sure by how much. Not sure what the numbers are but if it runs in 1/5 of the time then would need 5 auth sets before it makes sense to parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the suggested change above, this issue is OBE. AccessEvaluator will only evaluate using the parse tree if parse() has been called.

@dlmarion
Copy link
Contributor Author

Output of benchmark run from latest commit:

Benchmark                                                   Mode  Cnt   Score   Error   Units
AccessExpressionBenchmark.measureBytesValidation           thrpt   12  36.969 ± 0.088  ops/us
AccessExpressionBenchmark.measureStringValidation          thrpt   12  29.259 ± 0.012  ops/us
AccessExpressionBenchmark.measureBytesParseAndEvaluation   thrpt   12  16.858 ± 0.207  ops/us
AccessExpressionBenchmark.measureStringParseAndEvaluation  thrpt   12  14.497 ± 0.502  ops/us
AccessExpressionBenchmark.measureExpressionEvaluation      thrpt   12  14.055 ± 0.069  ops/us
AccessExpressionBenchmark.measureParsedEvaluation          thrpt   12  14.759 ± 0.037  ops/us
AccessExpressionBenchmark.measureLegacyParseAndEvaluation  thrpt   12   9.457 ± 0.124  ops/us
AccessExpressionBenchmark.measureLegacyEvaluationOnly      thrpt   12  22.226 ± 0.149  ops/us

@dlmarion
Copy link
Contributor Author

I think based on the benchmark output above, I can close this issue without merging the changes. It looks like reparsing the AccessExpression byte[] and evaluating it is faster than iterating over an existing parse tree. I think the corresponding issue can also be closed with a note that this is a reparsing in MultiAccessEvaluatorImpl, but it's faster than using the parse tree.

@keith-turner
Copy link
Contributor

Is there a benchmark that covers the case of evaluating w/ a parse tree but does not include the time to create the parse tree? In testing we did long ago using the Accumulo code it seems like it was really fast at evaluating if you happened to already have a parse tree and did not include the time to create the parse tree. Seems like those test created the parse trees outside the benchmarked code.

@dlmarion
Copy link
Contributor Author

dlmarion commented Jan 2, 2026

No, I don't think so. The benchmark currently contains methods that do the following:

  1. Validates an AccessExpression from a byte[]
  2. Validates an AccessExpression from a String
  3. Times AccessEvaluator.canAccess(byte[]) for different evaluators and expressions.
  4. Times VisibilityEvaluator.evaluate from a pre-constructed ColumnVisibility
  5. Times VisibilityEvaluator.evaluate from a newly constructed ColumnVisibility

Tests 1, 2, and 3 all end up calling ParserEvaluator.parseAccessExpression for each expression. If an expression is tested multiple times, then it will be re-parsed. I don't think the expressions are reused in the benchmark tests. Also, I think we recommend caching the result of the canAccess call for an expression outside of Accumulo-Access to avoid redundant work.

ParsedAccessExpression, and it's parse tree, are not used anywhere that I can tell.

@keith-turner
Copy link
Contributor

ParsedAccessExpression, and it's parse tree, are not used anywhere that I can tell.

Created #97 to time creating parse trees. It would be useful to look for changes in that timing, I would like to see the timing diff for that between main and #96.

Going to experiment w/ creating a slightly different benchmark for this PR.

@keith-turner
Copy link
Contributor

Actually the thing I was curious about is already done. The benchmarks measureParsedEvaluation and measureLegacyEvaluationOnly are conceptually doing the same thing w/ different code. Wonder why they have such a big diff in speed.


private boolean canAccessParsedAuthorization(String token) {
var bytesWrapper = ParserEvaluator.lookupWrappers.get();
var authToken = token.getBytes(UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allocating objects and copying data, could be the cause of being slower than the accumulo code. Not really sure how to avoid this, and w/ the changes in #96 it may not be worth trying to avoid at this point.

@dlmarion
Copy link
Contributor Author

dlmarion commented Jan 7, 2026

Here's the latest benchmark results:

Benchmark                                                   Mode  Cnt   Score   Error   Units
AccessExpressionBenchmark.measureBytesParseAndEvaluation   thrpt   12  16.495 ± 0.412  ops/us
AccessExpressionBenchmark.measureBytesValidation           thrpt   12  36.508 ± 0.632  ops/us
AccessExpressionBenchmark.measureCreateParseTree           thrpt   12  14.204 ± 0.138  ops/us
AccessExpressionBenchmark.measureExpressionEvaluation      thrpt   12  16.820 ± 0.204  ops/us
AccessExpressionBenchmark.measureLegacyEvaluationOnly      thrpt   12  22.360 ± 0.077  ops/us
AccessExpressionBenchmark.measureLegacyParseAndEvaluation  thrpt   12   9.122 ± 0.056  ops/us
AccessExpressionBenchmark.measureParsedEvaluation          thrpt   12  17.762 ± 0.090  ops/us
AccessExpressionBenchmark.measureStringParseAndEvaluation  thrpt   12  14.527 ± 0.251  ops/us
AccessExpressionBenchmark.measureStringValidation          thrpt   12  29.087 ± 0.211  ops/us

@dlmarion
Copy link
Contributor Author

dlmarion commented Jan 8, 2026

Closing, will re-evaluate to see if #78 is still an issue after #96 is merged.

@dlmarion dlmarion closed this Jan 8, 2026
@dlmarion dlmarion deleted the 78-multi-access-evaluator-optimization branch January 8, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible re-validation of an AccessExpression

2 participants