|
Related articles:
Machine learning
Key terms: math q policy pi we state return actions methods estimates reward optimal find gradient value gamma reinforcement learning algorithms model mdp problem current agent function approaches choosing explore direct markov software environment convergence samples applied policy space each state optimal policy gradient method gradient descent probabilities expected return current research reinforcement learning algorithms temporal difference learning Search external links cited by footnotes on Wikipedia page Reinforcement learning: |
|