Skip to content

[BUG] Always select node from a small subset of all nodes for any random session_id #574

@zghong

Description

@zghong

Describe the bug

For any random session_id, chproxy always selects node from a small subset of all nodes.

To Reproduce

Run chproxy on 127.0.0.1:8090 with the following config.xml:

clusters:
  - name: test_cluster
    replicas:
      - name: "replica1"
        nodes:
          - 127.0.1.1:8123
          - 127.0.1.2:8123
      - name: "replica2"
        nodes:
          - 127.0.2.1:8123
          - 127.0.2.2:8123
#! /bin/bash

for session_id in $(seq 0 1000); do
    echo "select hostname();" | curl "http://default:xxx@127.0.0.1:8090?session_id=$session_id" -d @-
done

the above script will always select node from 127.0.1.1 and 127.0.2.2, and the other nodes will never be selected.

Expected behavior

chproxy should select node from all nodes for any random session_id.

Screenshots

No.

Environment information

  • chproxy version: 1.30.0.

Additional context

  • The getReplicaSticky and getHostSticky functions exhibit performance bottlenecks and logical inconsistencies.
  • By design, sticky sessions should consistently route all requests with the same session_id to the exact same node, regardless of whether the node is active or not. However, the current implementation fails to maintain this consistency when the selected node's active status changes during this period. Resolving this issue presents challenges, especially in chproxy topologies with 2 or more replicas, which may require introducing distributed storage solutions like Redis. I have added TODO in the code and will open another issue about this.
    • Example 1: A sticky session initially routes requests to node 127.0.1.1 based on its session_id. If 127.0.1.1 later becomes inactive, subsequent requests with the same session_id are incorrectly rerouted to another active node (e.g., 127.0.2.2) instead of remaining directed to 127.0.1.1.
    • Example 2: A sticky session should route to node 127.0.1.1 but initially selects 127.0.2.2 because 127.0.1.1 is inactive. When 127.0.1.1 later becomes active, subsequent requests with the same session_id are incorrectly switched to 127.0.1.1, breaking session stickiness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions