免费一看一级欧美-免费一区二区三区免费视频-免费伊人-免费影片-99精品网-99精品小视频

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 99热国产在线 | 亚洲网址在线观看 | 欧美成人aa久久狼窝动画 | 五月婷婷欧美 | 亚州男人天堂 | 91色视| 天天噜噜日日噜噜久久综合网 | 四虎精品免费永久在线 | 日本黄在线 | 污网站免费在线观看 | 九色在线视频观看 | 国产精品亚洲欧美大片在线看 | 中文字幕成人在线观看 | 91精品欧美一区二区综合在线 | 国产黄片毛片 | 免费在线欧美 | 国内一级一级毛片a免费 | 亚洲日本中文字幕永久 | 日本不卡视频在线 | 999国产精品视频 | 亚洲天堂影院在线观看 | 男男羞羞视频网站国产 | 欧美激情影院 | 国产成人精品2021欧美日韩 | 亚洲国产婷婷香蕉久久久久久 | 在线观看色视频 | 久久这里只是精品免费视频 | 亚洲毛片免费在线观看 | 精品国产高清露脸在线观看 | 四虎剧场| 男人插入视频 | 亚洲高清在线天堂精品 | 看逼片| 国产精品黄色大片 | 精品视频在线免费观看 | 免费一级毛片在线播放不收费 | 羞羞视频免费网站欧美 | 国产女明星专区视频在线播放 | 欧美成人午夜精品免费福利 | 朋友的母亲在线播放 | 国产欧美日韩第一页 |