Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17674 +/- ##
=============================================
- Coverage 63.25% 34.05% -29.21%
+ Complexity 1499 780 -719
=============================================
Files 3174 3181 +7
Lines 190373 190794 +421
Branches 29089 29160 +71
=============================================
- Hits 120419 64970 -55449
- Misses 60610 120428 +59818
+ Partials 9344 5396 -3948
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
suvodeep-pyne
left a comment
There was a problem hiding this comment.
Hey @gortiz
This will be super useful and alerting on this should give us a much better idea of the state vs relying on string contains checks.
Q: Just to understand: the memory stats are per static instance of netty, as in per pool? I'm guessing 1 for SSE and 1 for MSE? any others?
| <!-- Solve NoClassDefFoundError. Borrowed from https://github.com/prometheus/jmx_exporter/issues/802 --> | ||
| <exclude>META-INF/versions/9/org/yaml/snakeyaml/internal/**</exclude> | ||
|
|
||
| <!-- Exclude NettyInstacne because it includes Netty package literals --> |
There was a problem hiding this comment.
This looks a bit fragile. Should we load the strings from somewhere instead? I understand the motivation but thing is 1 rename or package move and it might break.
There was a problem hiding this comment.
This is as fragile as the other exclusions, right?
Anyway, I'm going to modify the PR because I'm wrong. I was sure we were shading Netty, but we don't do that, so we are double-safe
We use Netty to implement SSE communication between brokers and servers and gRPC uses their own shade version of Netty under the hood. We use gRPC in 4 different places:
SSE, Client gRPC, direct client to serve gRPC and MSE mailbox protocols; expose their own metrics to report memory usage (note we don't have metrics for the plan protocol, but that is super simple and doesn't consume much). But each of these gRPC servers is also constrained by the max memory Netty can use (where SSE uses the limits of the unshaded Netty instance and all gRPC use the limits defined on the gRPC shaded Netty instance). So the metric for used memory by SSE (assuming the fix in #17667 is merged) should be similar to the used memory shown by the new metric for unshaded Netty, but for gRPC it will be the sum of the used memory by the MSE mailbox, MSE plan, and client gRPC protocols. |
|
|
||
| @Override | ||
| public boolean isExplicitTryReflectionSetAccessible() { | ||
| return io.netty.util.internal.ReflectionUtil.trySetAccessible(CONSTRUCTOR, true) == null; |
There was a problem hiding this comment.
This is still tricky: instead of calling a getter (which is private in Netty), we call the method they use to check whether they can call the private constructors of ByteBuffers. As long as Netty keeps using the same code (which seems very likely), this should be a safe, reflection-free way to know whether Netty can use offheap buffers or not.
13242b8 to
b13d346
Compare
|
I've rewritten |
This PR adds a new machinery to inspect, log and monitor the memory used by Netty. As is explained on the NettyInstance javadoc, a Pinot installation has between 2 copies of the Netty code:
Netty uses static attributes to keep some important information, like whether it can use off-heap memory or not, or how much memory is being allocated. Given we have 2 copies of Netty, we have 2 independent copies of these attributes. The applied shade doesn't just change the package of the classes, it also changes the system properties we need to set in order to customize Netty. We always need to set at least one JAVA_OPT in order to let Netty use off-heap memory in Java 21:
io.netty.tryReflectionSetAccessibleAs a corollary, for each JAVA_OPT we plan to use to customize Netty we need to provide that option 2 times with different prefixes.
Given that different copies of the same classes are involved here, this is an error-prone process. This is why we introduce a new class, NettyInstance, that offers clean access to the most important Netty properties. Two classes are provided, one for each known Netty copy we include in the classpath.
Another class, NettyInspector, is used to add some checks on the state of important properties of each known NettyInstance. Right now, the only check we have is whether it uses onheap of offheap memory, but more may be added in the future.
This PR also adds 2 new metrics per NettyInstance: memory usage and maximum memory. This is important **because by default each independent Netty copy will consume as much direct memory as specified by
XX:MaxDirectMemorySize, which means we may be consuming a maximum of:MaxDirectMemorySizebytes by normal ByteBuffersMaxDirectMemorySizebytes by unshade Netty (used in SSE)MaxDirectMemorySizebytes by grpc-shaded Netty (used in MSE)Finally, this PR also adds logs when Brokers and Servers start. Specifically: