Skip to content

Conversation

@dlmarion
Copy link
Contributor

@dlmarion dlmarion commented Dec 3, 2025

Modified ContextDefinition to implement KeywordExecutable so that the user can use accumulo create-context-definition to create the json required for the ContextDefinition file.

Added a Cleaner to call URLClassLoader.close on instances that are about to be garbage collected. The LocalCachingContextCleaner is used to register the URLClassLoader with the Cleaner and this class also keeps references to the URLClassLoaders. The list is accessed from a MXBean so that a user can use JMX to retrieve a list of files that are referenced in the local cache directory from the classloaders that have been created but not yet gc'd. This should aid users in determining which files can be removed from the local cache directory.

Modified ContextDefinition to implement KeywordExecutable so that
the user can use `accumulo create-context-definition` to create the
json required for the ContextDefinition file.

Added a Cleaner to call `URLClassLoader.close` on instances that are
about to be garbage collected. The LocalCachingContextCleaner is used
to register the URLClassLoader with the Cleaner and this class also
keeps references to the URLClassLoaders. The list is accessed from
a MXBean so that a user can use JMX to retrieve a list of files that
are referenced in the local cache directory from the classloaders
that have been created but not yet gc'd. This should aid users in
determining which files can be removed from the local cache directory.
@Override
public void configureMiniCluster(MiniAccumuloConfigImpl cfg,
org.apache.hadoop.conf.Configuration coreSite) {
cfg.removeJvmOption("-XX:+PerfDisableSharedMem");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires apache/accumulo#5999 to be merged. This JVM option was not allowing the test to attach to the locally running Accumulo JVM processes. Additionally, I think this is what causes jps not to report the processes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache/accumulo#6062 probably affects this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on that now.

pom.xml Outdated
<banDuplicatePomDependencyVersions />
<dependencyConvergence />
<banDynamicVersions />
<!-- <banDynamicVersions />-->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can revert this when 2.1.5 is released and the version changed above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in 2.1.5 does this PR depend on that is not in 2.1.4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could create a reminder issue for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in 2.1.5 does this PR depend on that is not in 2.1.4?

apache/accumulo#5999

Could create a reminder issue for this.

I will do that when this PR is merged.

@dlmarion
Copy link
Contributor Author

dlmarion commented Dec 3, 2025

This won't build successfully until apache/accumulo#5999 is merged. Tests pass locally for me.

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand why this needs to depend on 2.1.5-SNAPSHOT that isn't available in 2.1.4.

and unused old files within a context cache directory.
and unused old files within a context cache directory. To aid in this task a JMX MXBean has been created to expose the
files that are still referenced by the classloaders that are created. For an example of how to use this MXBean, please
see the test method `MiniAccumuloClusterClassLoaderFactoryTest.getReferencedFiles`. This method attaches to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a link to the test class.

Suggested change
see the test method `MiniAccumuloClusterClassLoaderFactoryTest.getReferencedFiles`. This method attaches to the
see the test method [MiniAccumuloClusterClassLoaderFactoryTest.getReferencedFiles](src/main/test/.../MiniAccumuloClusterClassLoaderFactoryTest.java). This method attaches to the

pom.xml Outdated
<banDuplicatePomDependencyVersions />
<dependencyConvergence />
<banDynamicVersions />
<!-- <banDynamicVersions />-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could create a reminder issue for this.

@dlmarion
Copy link
Contributor Author

PR updated and re-tested locally

safe to delete, it is not recommended to do so for any that are still in use,
because they can be useful for troubleshooting.

To aid in this task a JMX MXBean has been created to expose the files that are still referenced
Copy link
Contributor

@keith-turner keith-turner Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This offers the building blocks to create a cleanup utility and leave that as an exercise to the reader. It would probably be best (in a follow on change) to write a cleanup utility in this project that uses these building blocks, for the following reasons.

  • Can document and test the cleanup code in one place. If cleanup code is written, seems suboptimal for it to live in another repo or for there to be multiple implementions.
  • Having a working end to end impl gives confidence that these building blocks provide everything that is needed to do cleanup.
  • The cleanup code will need to have some awareness of the directory layout of the cache dir which is an internal implementation detail.
  • The clean up utility will have to get the algorithm correct in order to avoid race conditions. Seems like its important to list the dir first prior to get refs from all processes, but not completely sure. Would need to experiment and see the code to be sure.
  • Could make the cleanup code more robust by failing properly if no JMX seen or mismatch in dir passed in and dirs seen from jmx.

We could make the new command line utility have subcommands to support this. Something like the following.

accumulo context-classloader --help
  # create context definition json and print to stdout
  accumulo context-classloader  create -i <interval> -f {files}
  # remove files not in use by any of the given pids
  accumulo context-classloader  cleanup <cache dir> {pids}     

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to create a tool that works in one deployment scenario, but it might be hard to make it work in many of them. More information on how to make JMX client connections and server options here: https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html.

My test is using the connection mechanism described in the example at https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html#gdhkq. it works because I'm using mini in the tests and I know that all of the processes are on the localhost and because I remove that one system property (If that system property is set then this mechanism doesn't find the processes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to create a tool that works in one deployment scenario, but it might be hard to make it work in many of them

Do you know if different JMX connections could require different java code? Or can the java JMX client APIs connect to any endpoint given correct config? If the user may have to write different java code for different JMX scenarios then the best we could probably do in the project is provide a good example of the java cleanup code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if different JMX connections could require different java code?

Yes. In the test I use

List<VirtualMachineDescriptor> vmdl = VirtualMachine.list();
and
          VirtualMachine vm = VirtualMachine.attach(vmd);
          String connectorAddress = vm.getAgentProperties()
              .getProperty("com.sun.management.jmxremote.localConnectorAddress");

This works to find VMs that are running locally, attachs to them, and gets the connector address from them.

What I'm unsure of, is whether or not this approach would work if any of the options listed here are set on the processes. So, it's very possible that a client would need to be coded specific to how the server is configured, but I'm not 100% sure.

In the test I remove the JVM Option -XX:+PerfDisableSharedMem from the processes started by MiniAccumulo. If I don't remove this, then VirtualMachine.list() does not find the VMs (jps probably doesn't either). In that case, I would have to construct the address (service:jmx:rmi:///jndi/rmi://hostName:portNum/jmxrmi). If the server process is using security options (TLS, authentication, etc.), then more code is likely needed.

Or can the java JMX client APIs connect to any endpoint given correct config?

We would need to write, or find, a generic JMX client library that can be configured with all options. I didn't look for that specifically, but it might exist. If we can't find one, then we could write one, but I'm not sure we should do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't find one, then we could write one, but I'm not sure we should do that.

I agree, we should not write that. If something like that is needed, then it seems like an example is the best we can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #45 created for this.

* POM improvements
* Use var and multi-catch in getReferencedFiles
* Add newline to toJson()
* Check for nulls before calling user consumer
* Rename method in DeduplicationCache (values to forEach)
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my suggested changes have been merged in. I am a little concerned if there are any implications to changing the DeduplicationCache instance to static, but I've thought about it and haven't been able to come up with any reason why it would be a problem.

@keith-turner
Copy link
Contributor

What is the benefit of making the dedupe cache static?

@dlmarion
Copy link
Contributor Author

What is the benefit of making the dedupe cache static?

It had to be static to be called by the static getReferencedFiles method, which is called by the JMX MXBean.

@dlmarion
Copy link
Contributor Author

Test passes for me locally. Re-tested the accumulo create-context-definition command and it's working as expected. I think this is ready for merge.

@dlmarion dlmarion merged commit b4c3f7d into apache:main Jan 16, 2026
1 of 2 checks passed
@dlmarion dlmarion deleted the local-classloader-tooling branch January 16, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants