We probably need to have multiple replay buffers, one for each class of action.
We probably need to have multiple replay buffers, one for each class of action.