-
Notifications
You must be signed in to change notification settings - Fork 12
New HDFSContextClassLoaderFactory #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Creates SimpleHDFSClassLoaderFactory which will replace the use of the (buggy) VFSClassLoader with URLClassLoader while still supporting a ClassLoader per context: * SimpleHDFSClassLoaderFactory will be the replacement for the ReloadingVFSContextClassLoaderFactory and uses URLClassLoader instead of VFSClassLoader * Created SimpleHDFSClassLoaderFactoryTest * Bumped accumulo-classloaders from accumulo 2.1.3 to 2.1.4 * Minor javadoc fix for ReloadingVFSContextClassLoaderFactory Initial commit for apache#27
ddanielr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to test the new classloader functionality.
I suggest pulling some changes out into a separate PR that we can merge in and just have this PR contain the new feature logic.
.../java/org/apache/accumulo/classloader/vfs/context/ReloadingVFSContextClassLoaderFactory.java
Show resolved
Hide resolved
...a/org/apache/accumulo/classloader/vfs/context/ReloadingVFSContextClassLoaderFactoryTest.java
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public ClassLoader getClassLoader(String contextName) throws ContextClassLoaderException { | ||
| var expectedPath = userDefDir.resolve(contextName).resolve(MANIFEST_FILE_NAME); | ||
| var classLoader = classLoaders.getIfPresent(expectedPath); | ||
| if (classLoader == null) { | ||
| throw new ContextClassLoaderException(contextName); | ||
| } | ||
| return classLoader; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should have a few steps:
- Construct an hdfs path to the manifest file from the context name (the context name could be the full path to the manifest file, or it could be the directory that contains the manifest; alternatively, it could be just a relative directory name that is appended to some base URL that is set as an environment variable)
- Use the path to download the manifest file and read its contents
- Use the manifest file contents to download the files to a uniquely named local path (set by the user as an environment variable or system property)
- Construct a URLClassLoader using the local directory where the files were downloaded to, add it to the cache, and return it from this method
Step 3 is the complicated one, because it should be able to detect if another process is downloading the same context, and wait a reasonable, but brief, time for it to finish, before proceeding to do it itself. If it does proceed, it should do so in such a way that it doesn't matter which one finishes, in case the other one is still working.
If the context files are already downloaded, then it should just do a verify, and construct the URLClassLoader (if one wasn't already cached).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. I believe this is fully addressed in 7dce18c
.../java/org/apache/accumulo/classloader/vfs/context/ReloadingVFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...-loader/src/main/java/org/apache/accumulo/classloader/vfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| // Cache the class loaders for re-use | ||
| // WeakReferences are used so that the class loaders can be cleaned up when no longer needed | ||
| // Classes that are loaded contain a reference to the class loader used to load them | ||
| // so the class loader will be garbage collected when no more classes are loaded that reference it | ||
| private final Cache<Path,URLClassLoader> classLoaders = | ||
| CacheBuilder.newBuilder().weakValues().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want in separate PR: support cleanup of local context directories when no longer needed, or may want to leave it up to user to perform this cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache here is still needed for the instance of the classloader being reused for the same context... cleaning up the files on disk when no processes are using the context is a separate thing, but reusing an existing classloader for the same context is core behavior and should be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I just put this comment here since it's somewhat related to cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for cleanup of local context directories: bbc67b6
...s-loader/src/main/java/org/apache/accumulo/classloader/vfs/SimpleHDFSClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...-loader/src/main/java/org/apache/accumulo/classloader/vfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...-loader/src/main/java/org/apache/accumulo/classloader/vfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| var localManifestFile = localContextDownloadDir.resolve(MANIFEST_FILE_NAME); | ||
| fs.copyToLocalFile(HDFSManifestFile, new Path(localManifestFile.toUri())); | ||
| LOG.info("Copied manifest file from HDFS {} to local {}", HDFSManifestFile, localManifestFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that you need to write the manifest file locally. You could just read the contents of the file from HDFS and pass it to Gson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking... one way to detect that the manifest has changed or not is to use a directory uniquely named based on the context name and the hash of the manifest (which itself contains hashes to the files in the manifest). So, if the contents of the context changes, you can detect it and treat it like a new context. To do that, though, you will probably have to check the remote manifest each time the context is requested, which could be expensive. Could cache the context -> Path.for(context-manifestHash) mapping for some short period of time to avoid looking for context changes too frequently.
In this way, you always have to read the contents of the manifest in memory (to hash it and create a unique local directory for it), so the value of adding the manifest to that directory would only be for troubleshooting/debugging, really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my PR I approached this issue by:
- Fetching the Manifest file, computing a hash for it, and deserializing the json into a java object heirarchy that is kept in memory.
- The Manifest file contains the user specified interval for monitoring the manifest file location for manifest file changes.
- On that interval, refetch the Manifest file from the manifest file location, compute the hash to see if the file has changed. Respond accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the manifest file is changed (say for contextA), do we expect that change to be picked up soon e.g., on some interval? Or do we expect it to be picked up the next time we call getClassLoader(contextA)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on an interval. Checking the manifest file every time getClassLoader(contextA) is called could be too many HDFS calls (if the manifest is stored in HDFS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the interval present a problem? example:
interval = 10s
manifest file = contextA, jar1, jar2
time=0s: getClassLoader(contextA) = class loader with jar1, jar2
time=1s: change manifest file = contextA, jar3, jar4
time=5s: getClassLoader(contextA) = class loader with jar1, jar2 <-- we expected to get classloader with jar3, jar4
time=10s: now we update the classloader for contextA (too late?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intention is to be eventually consistent. As long as it updates correctly and around the interval, then I think that is sufficient. Across the cluster the update can't be performed at exactly the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this approach initially, but changed to checking the manifest on calls to getClassLoader instead of on an interval. The interval approach does not work when caching values/only storing WeakReferences to the class loader, as the value is not immediately returned to the user, thus no strong reference is made and may be GC'd at any time. Using the approach suggested by @jschmidt10 in #28 to first check the modification time, this should be fine/not too expensive to check in getClassLoader
I fully explain the problem and my fix in bbc67b6 commit message
- New thread which checks for changes to the manifest files (which have already been cached) on a configurable interval, updating the cache as necessary. - For any context, multiple threads may try to download the JARs, create the URLClassLoader, and cache the class loader at the same time, but only one will succeed (whichever completes first). If another thread has already started this operation, first waits a bit for it to complete. If it doesn't complete quickly, try ourselves. This avoids too many threads trying at the same time, but still safely allows for it if needed. - Now only download JARs from HDFS to local filesystem (no longer download manifest file - just read it from HDFS). - JARs are now checked after download to ensure the checksum matches what is present in the manifest file for that jar. - New tests including a concurrency stress-test.
| // verify downloaded jar checksum matches what is in the manifest file | ||
| var computedChecksumLocalJar = checksumLocalFile(localTempJarPath); | ||
| if (!computedChecksumLocalJar.equals(jar.getChecksum())) { | ||
| throw new ContextClassLoaderException(contextName, | ||
| new IllegalStateException(String.format( | ||
| "checksum: %s of downloaded jar: %s (downloaded from %s to %s) did not match " | ||
| + "checksum present: %s in manifest file: %s. Consider retrying and/or " | ||
| + "updating the checksum for this jar in the manifest file", | ||
| computedChecksumLocalJar, jar.getJarName(), HDFSJarPath, localTempJarPath, | ||
| jar.getChecksum(), HDFSManifestFile))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe want to retry download a few times in case the file got corrupted in download? (is this possible?)
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| public class HDFSContextClassLoaderFactory implements ContextClassLoaderFactory { | ||
| private static final Logger LOG = LoggerFactory.getLogger(HDFSContextClassLoaderFactory.class); | ||
| public static final String MANIFEST_FILE_NAME = "manifest.json"; | ||
| private static final String HASH_ALG = "SHA-256"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want the hash alg to be configurable?
| public static final String HDFS_CONTEXTS_BASE_DIR = "hdfs.contexts.class.loader.base.dir"; | ||
| public static final String LOCAL_CONTEXTS_DOWNLOAD_DIR = | ||
| "local.contexts.class.loader.download.dir"; | ||
| public static final String MANIFEST_FILE_CHECK_INTERVAL = "manifest.file.check.interval.sec"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is preferred: setting these as system props or setting them as (new) Accumulo properties. For now, have them as system props
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On larger systems, we've seen these periodic NameNode calls, like getting file info, put stress on it. These knobs might take a bit of tuning to figure out the sweet spot so I'd vote an Accumulo property so we don't have to bounce the tserver to try new values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with an Accumulo property. I can change this later if others agree as well, fine for now for simpler reviews/testing.
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Show resolved
Hide resolved
...er/src/test/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactoryTest.java
Show resolved
Hide resolved
- new dir `accumulo-classloaders/modules/classloaders` - module moved from `accumulo-classloaders/modules/vfs-class-loader` to `accumulo-classloaders/modules/classloaders/vfs-class-loader` - new module `accumulo-classloaders/modules/classloaders/hdfs-class-loader` - new dir `accumulo-classloaders/modules/classloaders/jar-creation` with children `shell` (stores the scripts to make jars used in tests by both vfs-class-loader and hdfs-class-loader modules - scripts were originally under vfs-class-loader) and `templates` (stores the template class files used in creating the jars - originally under vfs-class-loader)
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| // if the final path already exists, we are replacing it with temp path, so recursively delete | ||
| // the final path | ||
| if (Files.exists(tempState.finalPath)) { | ||
| FileUtils.deleteDirectory(tempState.finalPath.toFile()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit dubious to me off the top of my head. I'm picturing a situation where we have operations in progress and we blow away the old jars for a brief period. I guess the takeaway is you should be using new contexts instead of patching the existing one to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems inherently risky to be changing e.g., contextA from using JAR1 to using JAR2 while operations are in progress on JAR1. A warning would probably be good in the documentation somewhere, but this seems more of a user consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this some more, and I don't think it's a concern. Once the classes are loaded, the JARs can be deleted without issue. Wrote a quick test to confirm, and this works without issue:
@Test
public void testing() throws Exception {
HDFSContextClassLoaderFactory factory = new HDFSContextClassLoaderFactory();
factory.init(null);
writeManifestAndJarToHDFS(factory, HELLO_WORLD_JAR, HELLO_WORLD_JAR_RESOURCE_NAME, true);
var classLoader = factory.getClassLoader(CONTEXT1);
var clazz = classLoader.loadClass(HELLO_WORLD_CLASS);
FileUtils.deleteDirectory(
new File(System.getProperty(HDFSContextClassLoaderFactory.LOCAL_CONTEXTS_DOWNLOAD_DIR)));
var helloWorldObj = clazz.getDeclaredConstructor().newInstance();
var validate = clazz.getMethod("validate");
validate.invoke(helloWorldObj);
}
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| var HDFSManifestFileBytes = HDFSManifestFileIn.readAllBytes(); | ||
| try (var tempState = createTempState(contextName, HDFSManifestFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that createTempState() is always followed by createClassLoader(), can we combine those two methods into one? Just pass the args of createTempState() into createClassLoader()? That'll eliminate a bit of duplication between this building of the classloader and the code above where the FileChecker does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split these into two to allow multiple threads to attempt the creation at the same time for calls to getClassLoader() with the same context. Note that we only perform synchronization on the cache for createClassLoader(). createTempState() is called outside of this, which does most of the work:
try (var tempState = createTempState(...)) {
// atomically return stored value if it exists OR rename the temp dir to the final dir and
// create and store the new class loader
return classLoaders.computeIfAbsent(hdfsManifestFile, key -> {
try {
...createClassLoader(tempState);...
} catch (IOException e) {
throw new IllegalStateException(e);
}
}).getClassLoader();
}
...loader/src/main/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactory.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/apache/accumulo/classloader/hdfs/HDFSContextClassLoaderFactoryTest.java
Outdated
Show resolved
Hide resolved
- Avoid readAllBytes() for JAR files - First check manifest file metadata for last modification time before checking the checksum (avoids reading entire file unnecessarily) - Delay opening files until needed - Avoid logging retry completion if a retry was never attempted - Removed incorrect use of weakValues() for the Cache... Was originally Cache<Path,URLClassLoader> which made weakValues() appropriate at the time (since we return the value directly to the user, creating a strong reference on their end), but the value has since changed to a custom object with the class loader among other fields, so is no longer applicable (we still only return the class loader to the user, but don't create a strong reference to the whole custom object). Will address fix to support similar cleanup in another commit. - Fixed logs not working: added appropriate pom dependency, added log4j2-test.properties file to hdfs-class-loader module - Reduced visibility of many methods/classes - Optimization to clone MessageDigest - Improved some variable names
- Fixed a bug where the Cache was using weakValues() incorrectly. Cache value was originally URLClassLoader (value returned directly to user, creating strong reference). Cache value had since changed to be a custom object (FinalState), but was still using weakValues(). Value returned to user was still only the URLClassLoader part of FinalState, so a strong reference was created for the URLClassLoader, but not the actual value in the Cache, leading to Cache values being unexpectedly GC'd. Fixed this by: - replacing Cache with ConcurrentMap (since the Cache would have just been `CacheBuilder.newBuilder().build()`, functioning the same as a regular ConcurrentMap) - using WeakReference for the class loader in FinalState - ensuring a strong reference to the URLClassLoader exists until it is returned to the user - adding a new cleanup thread to cleanup map entries whose URLClassLoader has been GC'd - removed manifest file checker thread (would check if the manifest file for entries in the store had changed since being stored, updating the store and local context directory if needed). This periodic checker will not work when using WeakReference for the class loader in FinalState, as a strong reference is not immediately made. Instead of the checker thread, check on every call to getClassLoader(). This should be fine as now we first read the metadata to see if the manifest file has been touched since being stored. Only then do we read the whole file. - New cleanup thread additionally deletes local context directories when their associated map entry is removed. - Test changes and new test testStoreCleanup()
Did not need the context name being defined by both the context path and also a key in the manifest json file. e.g., previously could have "hdfs://test:1234/contexts/contextA" and the manifest file could have "context":"contextA". Only need one of these.
| var storeCleaner = new Thread(new StoreCleaner(TimeUnit.SECONDS.toMillis(30))); | ||
| storeCleaner.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can make this configurable or change this time. Just set to 30 seconds for now
Creates HDFSContextClassLoaderFactory which will replace the use of the (buggy) VFSClassLoader with URLClassLoader while still supporting a ClassLoader per context.