Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/arxiv/empirical/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@

\section{Introduction}
\todo[inline]{Introduction to DDPG and recent advances in deep RL. }
[INSERT OPENING SENTENCE HERE] The current state-of-the-art in deep reinforcement learning is the Deep Deterministic Policy Gradient (DDPG) algorithm [\cite{lillicrap2015ddpg}] which expanded the deterministic policy gradient algorithm [\cite{silver2014dpg}] to continuous, high dimensional action spaces, with much success. The basic idea of DDPG is to use an actor-critic algorithm based on the DPG algorithm, where the critic $Q(s, a)$ is learned as in deep Q network learning [\cite{mnih2013dqn}], which is a model-free learning regime, and the actor $\mu(s)$ is updated based on sampling the policy gradient from [\cite{silver2014dpg}]. This algorithm had success comparable to planning based solvers on many physical control problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I would then add a possibly second paragraph describing the downsides of this algorithm. *We need to motivate the rest of the paper! * Could be about: 1. Divergence, 2. Hyper parameter instability ( some $\gamma$s work and others do not, paradigmatically the method requires a lot of tuning, obviously you need to cite evidence for this argument). 3. The replay buffer is hacky, try and deconstruct the reasons for why it's use is essential for the DDPG algorithm


\todo[inline]{Biological diffusion of dopamine in the brain$\implies$ error backpropagation is not biologically feasible.}

Expand Down