算法类外文资料翻译

资源ID：126121 资源大小：249.50KB 全文页数：10页
资源格式： DOC 下载积分：100金币

快捷下载

账号登录下载

三方登录下载：

下载资源需要100金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

算法类外文资料翻译

1、 1 Q-Learning By Examples In this tutorial, you will discover step by step how an agent learns through training without teacher (unsupervised) in unknown environment. You will find out part of reinforcement learning algorithm called Q-learning. Reinforcement learning algorithm has been widely used f

2、or many applications such as robotics, multi agent system, game, and etc. Instead of learning the theory of reinforcement that you can read it from many books and other web sites (see Resources for more references), in this tutorial will introduce the concept through simple but comprehensive numeric

3、al example. You may also download the Matlab code or MS Excel Spreadsheet for free. Suppose we have 5 rooms in a building connected by certain doors as shown in the figure below. We give name to each room A to E. We can consider outside of the building as one big room to cover the building, and name

4、 it as F. Notice that there are two doors lead to the building from F, that is through room B and room E. We can represent the rooms by graph, each room as a vertex (or node) and each door as an edge (or link). Refer to my other tutorial on Graph if you are not sure about what is Graph. 2 We want to

5、 set the target room. If we put an agent in any room, we want the agent to go outside the building. In other word, the goal room is the node F. To set this kind of goal, we introduce give a kind of reward value to each door (i.e. edge of the graph). The doors that lead immediately to the goal have i

6、nstant reward of 100 (see diagram below, they have red arrows). Other doors that do not have direct connection to the target room have zero reward. Because the door is two way (from A can go to E and from E can go back to A), we assign two arrows to each room of the previous graph. Each arrow contai

7、ns an instant reward value. The graph becomes state diagram as shown below Additional loop with highest reward (100) is given to the goal room (F back to F) so that if the agent arrives at the goal, it will remain there forever. This type of goal is called absorbing goal because when it reaches the

8、goal state, it will stay in the goal state. Ladies and gentlemen, now is the time to introduce our superstar agent. Imagine our agent as a dumb virtual robot that can learn through experience. The agent can pass one room to another but has no knowledge of the environment. It does not know which sequ

9、ence of doors the agent must pass to go outside the building. Suppose we want to model some kind of simple evacuation of an agent from any room in the building. Now suppose we have an agent in Room C and we want the agent to learn to reach outside the house (F). (see diagram below) 3 How to make our

10、 agent learn from experience? Before we discuss about how the agent will learn (using Q learning) in the next section, let us discuss about some terminologies of state and action . We call each room (including outside the building) as a state . Agents movement from one room to another room is called

11、 action . Let us draw back our state diagram. State is depicted using node in the state diagram, while action is represented by the arrow. Suppose now the agent is in state C. From state C, the agent can go to state D because the state C is connected to D. From state C, however, the agent cannot directly go to state B because there is no direct door connecting room B and C (thus, no arrow). From state D, the agent can go either to state B or state E or back to state C (look at the

注意事项: 本文（算法类外文资料翻译）为本站会员（译***）主动上传，毕设资料网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请联系网站客服QQ：540560583，我们立即给予删除！