Skip to content

Need to revise definitions/implementation of examples_only, test_mode, etc. #12

@DannyWeitekamp

Description

@DannyWeitekamp

There are various situations where we don't want AL to go through a full feedback loop because we don't want AL to produce actions, receive feedback, etc.

From the perspective of AL_Train, typically:
1(A). An act() request is sent to AL for an action <- (i.e. AL needs to self-explain the next step)
2(F). AL's next action(s) is received and feedback is sent back to AL
or
3(D). AL has no next actions and an example is sent back to AL

Right now:
-examples_only: causes A to happen and D to happen regardless of AL's response
-test_mode: causes A. to happen, but not F. or D.

But we would also like a way for D. to happen without A.

These cases (at least) should be possible, lets call them "feedback_modes", potentially the user could just choose among these mutually exclusive options instead of setting flags preventing the three illegal ones:
-full/default:   A, F, D <-Normal ITS training loop
-no_hints :     A, F, _ <- Warning: Infinite Loop (Really only works if AL has finite action space + tries random things)
-predict_observe: A, _, D <- Demonstrations are always given
-observe_only:   _, _, D <- Demonstrations are always given
-test:     A, _, _ <- Moves to next item on first incorrect
-stepwise_test: A, _, _ <- Moves to next step (without sending demonstration) on incorrect

(, , ), (,F,D), (,F,) are impossible

There is an added complexity if we incorporate other levels of hint beyond bottomout. Additionally no_hints would probably require some kind of empty hint response to be given to AL that would prompt the agent to guess.

Metadata

Metadata

Assignees

Labels

discussionAn issue requiring discussioneasy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions