Learning with Uncertainty

栏目: IT技术 · 发布时间: 4年前

内容简介:The ideas behind modern reinforcement learning are built from theories of trial-and-error learning and computational adaptive control. The general goal of these approaches is to build an agent that maximizes rewards for a certain behaviour as it interacts

The ideas behind modern reinforcement learning are built from theories of trial-and-error learning and computational adaptive control. The general goal of these approaches is to build an agent that maximizes rewards for a certain behaviour as it interacts with a random Environment in a feedback loop. The agent updates its policy or strategy for making decisions in the face of uncertainty by the responses the agent receives from the Environment.

Learning with Uncertainty

General reinforcement learning framework. An agent interacts with an Environment in a feedback configuration and updates its strategy for choosing an action based on the responses it gets from the Environment.

A trial-and-error search is an approach to learning behaviour from the field of animal psychology. Thorndike, Pavlov and Skinner were major proponents in this field of learning. The theory of trial and error learning concerns itself with how agents learn behaviour by the strengthening or weakening of mental bonds based on the satisfaction or discomfort the agent perceives from the Environment after carrying out a particular action (Thorndike, 1898). This idea of learning was called the “law of effect” where “satisfaction” is the reinforcing of an accompanying action based on a “reward” and “discomfort” leads to the discontinuation of an action due to “penalty”. These ideas of rewards and penalties were explored further by B.F. Skinner’s with his work on operant conditioning, which posits that the agent voluntarily reinforces its behaviour based on the stimuli or action resulting in a response from the Environment (Skinner, 1938). On the other hand, Pavlov’s classical conditioning argues that the pairing of stimuli (of which the first is the unconditioned stimulus) creates an involuntary response in behaviour by the agent (Pavlov, 1927). Both behavioural theories of learning involve the notion of some sort of associative pairing of stimuli to the response whereby an agent’s behaviour is conditioned by the repetition of actions in a feedback loop.

Learning with Uncertainty

T-maze. The T-maze is used in conditioning experiments to examine the behaviour of rodents as they learn to find food from successive trials using different schedules.

Trial-and-error learning and the law of effects have two distinct properties that have influenced modern reinforcement learning techniques in that they are selectional and associative. Modern RL is selectional given that for a particular state of the Environment, an action is sampled from a set of actions, and it is associative given that favourable actions with their associated states are remembered (i.e. stored in memory) (Sutton and Barto, 1998).

The field of adaptive control is concerned with learning the behaviour of a controller (or an agent) in a complex dynamical system where uncertainties exist in the parameters of the controlled system. (Bellman, 1961) categorized control problems as deterministic, stochastic and adaptive. In an adaptive control system, a considerable level of uncertainty exists in the system where little is known about the structure of the Environment or the distribution of the parameters. While experimentation may be used to obtain some information about the system, the time taken will make such an approach infeasible. Hence, the need to learn the behaviour of the controller in an “online” configuration. (Bellman, 1957a) showed the Bellman equation as a function that captures the state and value function of a dynamical system and introduced dynamic programming as a class of methods for finding the optimal controller for an adaptive control problem. (Bellman, 1957b) formulated the Markov Decision Processes (MDP) as a discrete-time stochastic control process for modelling the reinforcement learning framework where the agent interacts with the Environment in a feedback loop. The Markov property assumes that the current state captures all the information necessary to predict the next state and its expected response without relying on the previous sequence of states. In other words, the Markov property is the conditional probability that the future states of the Environment only depends on the current state. Hence it is conditionally independent of the past states given that we know the current state. The MDP is based on the theoretical assumption that the states of the Environment possess the Markov property.

Bibliography

  • Pavlov IP (1927). Translated by Anrep GV. “Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex”. Nature. 121 (3052): 662–664. Bibcode:1928Natur.121..662D. doi:10.1038/121662a0.
  • Skinner, B. F. (1938). The behaviour of organisms: an experimental analysis. Appleton-Century.
  • Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 2, №4). Cambridge: MIT Press.
  • Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i-109. https://doi.org/10.1037/h0092987.
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT press.
  • Bellman, R. E. (1961). Adaptive Control Processes — A Guided Tour. Princeton: Princeton University Press.
  • Bellman, R. E. (1957a). Dynamic Programming. Princeton: Princeton University Press.
  • Bellman, R. E. (1957b). A Markovian Decision Process. Journal of Mathematics and Mechanics, 6(5), 679–684. Retrieved from www.jstor.org/stable/24900506.

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

轻公司

轻公司

李黎、杜晨 / 中信出版社 / 2009-7 / 39.00元

《轻公司》解读了在互联网和IT技术越来越充裕的环境里,传统的商业法则是如何被打破,而新的商业法则如何建立起来的过程。大量生动翔实的采访,为我们构筑了互联网和IT技术影响下的未来商业趋势。李黎和杜晨在《IT经理世界》上发表了一篇封面报道《轻公司》后,迅速在传统行业及互联网行业产生极大反响,无论是老牌的传统企业、创业公司、风险投资商,都视这篇文章为新商业宝典,甚至有业界人士评价,这篇文章拯救了中国的电......一起来看看 《轻公司》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

SHA 加密
SHA 加密

SHA 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试