Learning with Uncertainty

栏目: IT技术 · 发布时间: 6年前

内容简介:The ideas behind modern reinforcement learning are built from theories of trial-and-error learning and computational adaptive control. The general goal of these approaches is to build an agent that maximizes rewards for a certain behaviour as it interacts

The ideas behind modern reinforcement learning are built from theories of trial-and-error learning and computational adaptive control. The general goal of these approaches is to build an agent that maximizes rewards for a certain behaviour as it interacts with a random Environment in a feedback loop. The agent updates its policy or strategy for making decisions in the face of uncertainty by the responses the agent receives from the Environment.

Learning with Uncertainty

General reinforcement learning framework. An agent interacts with an Environment in a feedback configuration and updates its strategy for choosing an action based on the responses it gets from the Environment.

A trial-and-error search is an approach to learning behaviour from the field of animal psychology. Thorndike, Pavlov and Skinner were major proponents in this field of learning. The theory of trial and error learning concerns itself with how agents learn behaviour by the strengthening or weakening of mental bonds based on the satisfaction or discomfort the agent perceives from the Environment after carrying out a particular action (Thorndike, 1898). This idea of learning was called the “law of effect” where “satisfaction” is the reinforcing of an accompanying action based on a “reward” and “discomfort” leads to the discontinuation of an action due to “penalty”. These ideas of rewards and penalties were explored further by B.F. Skinner’s with his work on operant conditioning, which posits that the agent voluntarily reinforces its behaviour based on the stimuli or action resulting in a response from the Environment (Skinner, 1938). On the other hand, Pavlov’s classical conditioning argues that the pairing of stimuli (of which the first is the unconditioned stimulus) creates an involuntary response in behaviour by the agent (Pavlov, 1927). Both behavioural theories of learning involve the notion of some sort of associative pairing of stimuli to the response whereby an agent’s behaviour is conditioned by the repetition of actions in a feedback loop.

Learning with Uncertainty

T-maze. The T-maze is used in conditioning experiments to examine the behaviour of rodents as they learn to find food from successive trials using different schedules.

Trial-and-error learning and the law of effects have two distinct properties that have influenced modern reinforcement learning techniques in that they are selectional and associative. Modern RL is selectional given that for a particular state of the Environment, an action is sampled from a set of actions, and it is associative given that favourable actions with their associated states are remembered (i.e. stored in memory) (Sutton and Barto, 1998).

The field of adaptive control is concerned with learning the behaviour of a controller (or an agent) in a complex dynamical system where uncertainties exist in the parameters of the controlled system. (Bellman, 1961) categorized control problems as deterministic, stochastic and adaptive. In an adaptive control system, a considerable level of uncertainty exists in the system where little is known about the structure of the Environment or the distribution of the parameters. While experimentation may be used to obtain some information about the system, the time taken will make such an approach infeasible. Hence, the need to learn the behaviour of the controller in an “online” configuration. (Bellman, 1957a) showed the Bellman equation as a function that captures the state and value function of a dynamical system and introduced dynamic programming as a class of methods for finding the optimal controller for an adaptive control problem. (Bellman, 1957b) formulated the Markov Decision Processes (MDP) as a discrete-time stochastic control process for modelling the reinforcement learning framework where the agent interacts with the Environment in a feedback loop. The Markov property assumes that the current state captures all the information necessary to predict the next state and its expected response without relying on the previous sequence of states. In other words, the Markov property is the conditional probability that the future states of the Environment only depends on the current state. Hence it is conditionally independent of the past states given that we know the current state. The MDP is based on the theoretical assumption that the states of the Environment possess the Markov property.

Bibliography

  • Pavlov IP (1927). Translated by Anrep GV. “Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex”. Nature. 121 (3052): 662–664. Bibcode:1928Natur.121..662D. doi:10.1038/121662a0.
  • Skinner, B. F. (1938). The behaviour of organisms: an experimental analysis. Appleton-Century.
  • Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 2, №4). Cambridge: MIT Press.
  • Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i-109. https://doi.org/10.1037/h0092987.
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT press.
  • Bellman, R. E. (1961). Adaptive Control Processes — A Guided Tour. Princeton: Princeton University Press.
  • Bellman, R. E. (1957a). Dynamic Programming. Princeton: Princeton University Press.
  • Bellman, R. E. (1957b). A Markovian Decision Process. Journal of Mathematics and Mechanics, 6(5), 679–684. Retrieved from www.jstor.org/stable/24900506.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

ASP.NET 2.0开发指南

ASP.NET 2.0开发指南

郝刚 / 人民邮电出版社 / 2006 / 88.0

本书紧紧围绕ASP.NET 2.0技术精髓展开深入讲解,全书分为6个部分,共18章。第1部分介绍基础知识,包括ASP.NET 2.0概述、Visual Studio 2005集成开发环境、创建ASP.NET应用程序和C# 2.0程序设计基础。第2部分讲解用户界面方面的特性,包括母版页、主题和皮肤、站点导航控件和其他新增服务器控件。第3部分探讨了数据访问方面的内容,包括数据访问技术概述、数据源控件、......一起来看看 《ASP.NET 2.0开发指南》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具