Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

StackOverflow Point

StackOverflow Point Navigation

  • Web Stories
  • Badges
  • Tags
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Web Stories
  • Badges
  • Tags
Home/ Questions/Q 241265
Next
Alex Hales
  • 0
Alex HalesTeacher
Asked: August 10, 20222022-08-10T03:18:30+00:00 2022-08-10T03:18:30+00:00In: Machine Learning, machine-learning, markov-decision-process, reinforcement-learning, terminology

terminology – What is a policy in reinforcement learning?

  • 0

[ad_1]

The definition is correct, though not instantly obvious if you see it for the first time. Let me put it this way: a policy is an agent’s strategy.

For example, imagine a world where a robot moves across the room and the task is to get to the target point (x, y), where it gets a reward. Here:

Obviously, some policies are better than others, and there are multiple ways to assess them, namely state-value function and action-value function. The goal of RL is to learn the best policy. Now the definition should make more sense (note that in the context time is better understood as a state):

A policy defines the learning agent’s way of behaving at a given time.

Formally

More formally, we should first define Markov Decision Process (MDP) as a tuple (S, A, P, R, y), where:

  • S is a finite set of states
  • A is a finite set of actions
  • P is a state transition probability matrix (probability of ending up in a state for each current state and each action)
  • R is a reward function, given a state and an action
  • y is a discount factor, between 0 and 1

Then, a policy π is a probability distribution over actions given states. That is the likelihood of every action when an agent is in a particular state (of course, I’m skipping a lot of details here). This definition corresponds to the second part of your definition.

I highly recommend David Silver’s RL course available on YouTube. The first two lectures focus particularly on MDPs and policies.

[ad_2]

  • 0 0 Answers
  • 6 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report
Leave an answer

Leave an answer
Cancel reply

Browse

Sidebar

Ask A Question

Related Questions

  • xcode - Can you build dynamic libraries for iOS and ...

    • 0 Answers
  • bash - How to check if a process id (PID) ...

    • 401 Answers
  • database - Oracle: Changing VARCHAR2 column to CLOB

    • 371 Answers
  • What's the difference between HEAD, working tree and index, in ...

    • 367 Answers
  • Amazon EC2 Free tier - how many instances can I ...

    • 0 Answers

Stats

  • Questions : 43k

Subscribe

Login

Forgot Password?

Footer

Follow

© 2022 Stackoverflow Point. All Rights Reserved.

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.