Is the architecture training different policies for different tasks? If not how are we specifying the task at test time?