Skip to content

Improve startup cache loading performance#9307

Open
mbien wants to merge 1 commit intoapache:masterfrom
mbien:cache-loading-perf
Open

Improve startup cache loading performance#9307
mbien wants to merge 1 commit intoapache:masterfrom
mbien:cache-loading-perf

Conversation

@mbien
Copy link
Copy Markdown
Member

@mbien mbien commented Mar 29, 2026

  • use Data*Streams instead of Object*Streams
  • simplify file format (no XML)
  • code renovation and minor smaller optimizations

result are about 4x faster cache loading times (writing probably too but its not relevant for UX)

minor:

  • List -> Set for field solely used for contains() in inner loop

in numbers:

readCache(): 113ms -> 28ms
calculateParents(): 128ms -> 30ms

or see the readCache() and calculateParents() marked in purple

before:
startup_cacheio_0

after:
startup_cacheio_1

third PR for some more startup optimizations pending. (first was #9303)

@mbien mbien added this to the NB30 milestone Mar 29, 2026
@mbien mbien added Code cleanup Label for cleanup done on the Netbeans IDE performance Platform [ci] enable platform tests (platform/*) ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Mar 29, 2026
@mbien mbien force-pushed the cache-loading-perf branch from 0047b0a to 34d1d39 Compare March 29, 2026 17:08
@mbien mbien force-pushed the cache-loading-perf branch from 34d1d39 to e28b5f1 Compare April 1, 2026 22:09
@mbien
Copy link
Copy Markdown
Member Author

mbien commented Apr 2, 2026

tested a few failure modes. e.g using a corrupted cache or one from an older NB version and it got reset on start.

Details

e.g cache containing invalid data would show:

INFO [org.netbeans.core.startup.ModuleList]: Cannot read cache
java.io.UTFDataFormatException: malformed input around byte 6
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:641)
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:550)
	at org.netbeans.core.startup.ModuleList.readProps(ModuleList.java:652)
[catch] at org.netbeans.core.startup.ModuleList.readCache(ModuleList.java:626)
	at org.netbeans.core.startup.ModuleList$ReadInitial.run(ModuleList.java:1583)
	at org.openide.filesystems.EventControl.runAtomicAction(EventControl.java:102)
	at org.openide.filesystems.FileSystem.runAtomicAction(FileSystem.java:494)
	at org.netbeans.core.startup.ModuleList.readInitial(ModuleList.java:147)
	at org.netbeans.core.startup.ModuleSystem.readList(ModuleSystem.java:280)
	at org.netbeans.core.startup.Main.getModuleSystem(Main.java:171)
	at org.netbeans.core.startup.Main.getModuleSystem(Main.java:141)
	at org.netbeans.core.startup.Main.start(Main.java:307)
	at org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:99)
	at java.base/java.lang.Thread.run(Thread.java:1583)

but there are various checks before that so it would fail silently in most cases without even getting to the point where it is loaded.

@mbien mbien marked this pull request as ready for review April 2, 2026 06:00
 - use Data*Streams instead of Object*Streams
 - simplify file format (no XML)
 - code renovation and other minor optimizations

results in 4x faster cache loading times (readCache() and
calculateParents() methods)

minor:
 - ModuleManager.EnableContext: List -> Set for field solely used for
   contains() in inner loop
 - Module: data loading can synchronize on instance instead of class
   (DATA_LOCK no longer static)
@mbien mbien force-pushed the cache-loading-perf branch from e28b5f1 to 54c0790 Compare April 2, 2026 08:06
@BradWalker
Copy link
Copy Markdown
Member

Nice work!

return deps;
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exported class - before this change an instance of Dependency could only be created through Dependency#create now there is a option to inject a DataInput. Is there an option to keep this an implementation detail?

The created format does not look like something I'd like people to potentially rely on.

Copy link
Copy Markdown
Member Author

@mbien mbien Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the class itself can be only constructed from the inside. The create(String body) factory method is essentially a parsing constructor and can be fed Strings like org.netbeans.bootstrap/1 to create dependencies (which is done in some cases).

Even Dependency.create(Dependency.TYPE_MODULE, "hugo"); would already return a dependency.

The first impl for performance measurements had the constructor made public and the new read/write() methods were outside. However, the code was also a bit less maintainable since read/write() are used from two different places:

org.netbeans.ModuleData and org.netbeans.core.startup.ModuleList both use read()/write()

I couldn't come up with a better solution, since the goal is to avoid using ObjectOutputStream. Using read/writeObject actually performed worse and produced larger files, Externalizable would require a public no-arg constructor so I discarded that thought early + I believe it would also have required to keep ObjectOutputStream. (both cases would also have public methods)

Having them package private would require accessor utilities in two modules using the same namespace. Also not pretty. Maybe a comment that the methods are only meant for short term storage would be sufficient?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package private methods in Dependency + pasting this into the two modules is probably worse than a comment on the methods?

package org.openide.modules;
//...
public class DepAccessor {
    public static Dependency read(DataInput is) throws IOException {
        return Dependency.read(is);
    }
    public static void write(Dependency dep, DataOutput os) throws IOException {
        dep.write(os);
    }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume using Dependency#create would be out of the question? If so, I tend to agree, that adding comments seems like a valid approach. It mimics the "Serialization" warning that ca be found in the swing classes, that explicitly declares, that serialized versions will not be compatible with future versions. I'd add that there is no guarantee for the used data format.

Comment on lines +690 to +691
.map(e -> e.getKey() + "=" + e.getValue())
.collect(Collectors.joining(","));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break one either a key or a value contains = or ,.

Copy link
Copy Markdown
Member Author

@mbien mbien Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but can this happen? The set of keys is restricted to (see readProps):

                    case "name", "jar" -> entry[1];
                    case "enabled", "autoload", "eager", "reloadable" -> Boolean.valueOf(entry[1]);
                    case "startlevel" -> Integer.valueOf(entry[1]);
                    default -> throw new IOException("unknown key " + entry[0]);

the first two are String values, the rest are primitives.

This does not cache all properties, only a very limited set. (this was already done before this change)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with these ad-hoc encoding is, that they have the tendency to creep up later and hunt you when you least expect it.

name currently is the codename base, at this point in time to my knowledge CNDs can't contain both, but that is not set into stone.

Not sure that a naming convention for jar files is enforced for modules, but at its core it is a filename and that can hold both , and =.

Why not reuse an approach you are already using: don't invent a new serialization, but write the number of key/value pairs to the stream and then the keys and values as pure strings. That way there is no need to consider "special" characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) Code cleanup Label for cleanup done on the Netbeans IDE performance Platform [ci] enable platform tests (platform/*)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants