Skip to content

[BUG] Leaked Socket prevents CRaC checkpointing #1325

@jnd77

Description

@jnd77

Describe the bug

This is a follow-up to the issue #1233, and the last comment in particular.
I still encounter some issues with an opened Socket, but it's random and I didn't manage to come up with a simple reproducer.

To Reproduce

I did some investigations, and here are the findings:

  1. DatabricksClientConfiguratorManager#instances is always empty after your fix, so that's good.
  2. DatabricksHttpClientFactory#instances is not always empty; sometimes there is still a DatabricksHttpClient with type TELEMETRY.
    I suspect there is a race condition where the connection closes and calls DatabricksHttpClientFactory#removeClient(context), and afterwards a different thread calls DatabricksHttpClientFactory#getClient(context, TELEMETRY) with that connection's context.

I had to do this hack:

    try {
      final Field instancesField = DatabricksHttpClientFactory.class.getDeclaredField("instances");
      instancesField.setAccessible(true);
      final Map<SimpleEntry<String, HttpClientType>, DatabricksHttpClient> openedClients =
          (Map<SimpleEntry<String, HttpClientType>, DatabricksHttpClient>)
              instancesField.get(DatabricksHttpClientFactory.getInstance());
      openedClients
          .values()
          .forEach(
              databricksHttpClient -> {
                try {
                  databricksHttpClient.close();
                } catch (final IOException e) {
                  // Do not throw
                }
              });
    } catch (final Exception e) {
      // Do nothing and hope checkpoint can happen
    }

Expected behavior

I would expect all the sockets to be closed when all connections have been closed.

Client Environment (please complete the following information):

  • OS: [e.g. Windows] Ubuntu
  • Java version [e.g. Java 21] Java 21
  • Java vendor [e.g. OpenJDK] Azul
  • Driver Version [e.g. 3.1.1] 3.3.1

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions