-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
What steps will reproduce the problem?
1. Run a gossip group with 3 members, let their heartbeats count up over 20
or so, all successfully gossiping
2. Shut one down
3. Wait for all members in live group to decide the dead member is dead
4. Restart dead member
5. Dead member will try to seed but it will be ignored due to it's
heartbeat not being high enough (Client.java: line 329 rev 9509ef5052).
6. Dead member will then not get any membership lists from live group and
think they are dead too.
What is the expected output? What do you see instead?
Dead member when restarted should be able to re join the live gossip group.
Please provide any additional information below.
This can be fixed by having a kind of zombie state between dead and alive
where if you notice that the zombie heartbeat is increasing it must have
restarted and it's heartbeat set back to zero and be increasing from there:
private Map<Member, Long> zombieHeartbeats = new Hashtable<Member, Long>();
...
...
} else if(deadMembers.contains(remoteMember)){
Member deadMember = deadMembers.get(deadMembers.indexOf(remoteMember));
if(remoteMember.getHeartBeat() > deadMember.getHeartBeat()) {
deadMembers.remove(remoteMember);
healthyMembers.add(deadMember);
deadMember.setHeartBeat(remoteMember.getHeartBeat());
deadMember.resetTimeoutTimer();
} else if(zombieHeartbeats.containsKey(remoteMember) &&
remoteMember.getHeartBeat() > zombieHeartbeats.get(remoteMember)) {
deadMembers.remove(remoteMember);
healthyMembers.add(deadMember);
deadMember.setHeartBeat(remoteMember.getHeartBeat());
deadMember.resetTimeoutTimer();
zombieHeartbeats.remove(remoteMember);
} else {
zombieHeartbeats.put(remoteMember, remoteMember.getHeartBeat());
}
Hope that helps.
Original issue reported on code.google.com by simon.l...@gmail.com on 8 Apr 2010 at 3:54