Conversation
Restored highly available service in order to support multiple managers. This change should also avoid clients that connect to a non-primary manager from getting stuck. Made all thrift methods throw ThriftNotActiveServiceException, if the method does not declare this it will cause a server side exception. There was a single one way thrift method that could not throw this, so it would still cause server side exceptions if its called. Delaying advertising should prevent this in most cases.
|
Now that the compaction coordinator can throw ThriftNotActiveServiceException the compactors should retry if they see this exception. Looked at the compactors and it seems like they will retry as all of their calls use RetryableThriftCall. |
server/base/src/main/java/org/apache/accumulo/server/rpc/HighlyAvailableServiceWrapper.java
Show resolved
Hide resolved
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java
Outdated
Show resolved
Hide resolved
|
I wonder if there is a way to automate a check that all Thrift RPC methods handled by the Manager throws the ThriftNotActiveServiceException |
Not sure this functionality has any tests. There is the server side aspect of throwing the exception, if we could automate test for that it would be nice. Then there is the client side code that retries when it sees this exception, that code is sprinkled all over the place. The server side code seems easier to test. For the client side code, maybe the best way to test is to run random walk tests while starting/stopping managers. |
|
For the client side code, maybe if we refactored to use more common code on the client side then we could focus on testing that common code. I noticed the compactor code did not need to change because it uses common code for retry. That common code retries on TException though which may be too broad for the general case, like we probably should not retry on a thrift security exception. |
|
I looked at trying to write a test for this, but could not figure it out quickly. Another option would be to fail at runtime on the server side, which should be caught during testing. To do this, we could include the following method in the ThriftProcessorTypes class, then call it for each interface passed to the getManagerTProcessor method. |
|
Re using reflection to look for absence of the exception, thrift one way methods can not throw an exception. Not sure if we can detect if a method is one way. If we could that would be nice in the HAService code, it could just ignore oneway thrift calls when not the primary instead of throwing an exception. |
|
It looks like the information that's needed is in the Processor class, specifically a subclass that has the same name as the method, then you call For example, for the |
| if (onewayMethods.contains(method.getName())) { | ||
| // if thrift one way method throws an exception it will just log an error | ||
| LOG.debug("Ignoring one way thrift call because not active : {} {}", method.getName(), | ||
| Arrays.asList(args)); |
There was a problem hiding this comment.
I think you need to return on this line, or it will fall through and invoke the method.
| if (onewayMethods.contains(method.getName())) { | ||
| // if thrift one way method throws an exception it will just log an error | ||
| LOG.debug("Ignoring one way thrift call during upgrade : {} {}", method.getName(), | ||
| Arrays.asList(args)); |
There was a problem hiding this comment.
I think you need to return on this line, or it will fall through and invoke the method.
There was a problem hiding this comment.
Good catch, that would have spammed the logs w/ errors. Fixed in d5670e6
| try { | ||
| method = ProcessFunction.class.getDeclaredMethod("isOneway"); | ||
| } catch (NoSuchMethodException e) { | ||
| throw new IllegalStateException(e); |
There was a problem hiding this comment.
We probably want to log the class name for debugging purposes to find out why it doesn't have the declared method.
Restored highly available service in order to support multiple managers #6139 . This change should also avoid clients that connect to a non-primary manager from getting stuck. Made all thrift methods throw ThriftNotActiveServiceException, if the method does not declare this it will cause a server side exception. There was a single one way thrift method that could not throw this, so it would still cause server side exceptions if its called. Delaying advertising should prevent this in most cases.