免费一看一级欧美-免费一区二区三区免费视频-免费伊人-免费影片-99精品网-99精品小视频

課程目錄: 基于樣本的學習方法培訓
4401 人關注
(78637/99817)
課程大綱:

    基于樣本的學習方法培訓

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 亚洲人成在线中文字幕 | 国产一区二区日韩欧美在线 | 亚洲区一二三四区2021 | 最新久久精品 | 91手机看片国产永久免费 | 热re99久久精品国99热 | 韩国理论毛片a级 | 久久九九热 | 成人福利网站在线看视频 | 亚洲线精品久久一区二区三区 | 制服师生一区二区三区在线 | 久久成人免费播放网站 | 精品区卡一卡2卡三免费 | 看一级毛片免费观看视频 | 两个人免费完整高清视频中国 | 女网址www呦女 | 五月婷婷俺来也 | 狼伊千合综网中文 | 久久精品爱国产免费久久 | 日本免费无遮挡吸乳视频中文 | 青青草一区二区免费精品 | 狠狠色丁香婷综合久久 | 青青青青青操 | 久热中文字幕在线精品免费 | 免费网站色 | 国产精品久久久久国产精品 | 极品在线播放 | 青草精品视频 | 亚洲视频在线观看视频 | 国产成人精品曰本亚洲 | 国产美女白嫩嫩在线观看 | 麻豆作品| 国产精品亚洲精品青青青 | 男人天堂网2022 | 国产11一12周岁女毛片 | 亚州三级视频 | 日日操夜夜操狠狠操 | 成人毛片18女人毛片免费96 | 最新国产成人盗摄精品视频 | 国产亚洲三级 | 欧美日韩中文在线 |