next_inactive up previous


RL-Glue Matlab Codec 1.03 Manual

Brian Tanner :: brian@tannerpages.com


Contents

Introduction

This document describes how to use the Matlab RL-Glue Codec, a software library that provides socket-compatibility with the RL-Glue Reinforcement Learning software library. Matlab is a brand new codec, created specifically for the RL-Glue 3.0 release. Special thanks (or pokes) should go to Dale, Doina, and Yaki who gave voices to the countless others who probably have scorned us for not supporting Matlab earlier. It turned out to not even be that difficult :)

For general information and motivation about the RL-Glue1 project, please refer to the documentation provided with that project.

This codec will allow you to create agents, environments, and experiment programs in Matlab.

This software project is licensed under the Apache-2.02 license. We're not lawyers, but our intention is that this code should be used however it is useful. We'd appreciate to hear what you're using it for, and to get credit if appropriate.

This project has a home here:
http://glue.rl-community.org/Home/Extensions/matlab-codec

Software Requirements

To run agents, environments, and experiments created with this codec, you will need to have RL-Glue executable socket server (rl_glue(.exe)) installed on your computer. It is available in several packages at:
http://code.google.com/p/rl-glue-ext/wiki/RLGlueCore

Compiling and running components with this codec requires Matlab. The codec was developed on Matlab 7.6.0.x, it has not been tested extensively on other versions. Reports from the community suggest that Matlab 7.5 or higher is required to run this codec.

This Matlab codec uses the RL-Glue Java Extension, which means that Matlab needs to be running with the Java Virtual Machine enabled (it is by default). The Java extension does not need to be installed independently.

Possible Contribution: Someone with Matlab experience could help us find out what exact version of Matlab is required to use this codec, and could help us update the codec to be as robust as possible to older versions.

Getting the Codec

The codec can be downloaded either as a .tar.gz or can be checked out of the subversion repository where it is hosted.

The .tar.gz distribution can be found here:
http://code.google.com/p/rl-glue-ext/wiki/Matlab

To check the code out of subversion:
svn checkout http://rl-glue-ext.googlecode.com/svn/trunk/projects/codecs/Matlab Matlab-Codec

Installing the Codec

This codec is package with a Matlab ``installer'' which copies the codec source files and the RL-Glue Java Extension JAR file to a user-configurable location and adds them to your Matlab path and Matlab Java classpaths. This is the recommended way of using the codec. Alternatively, you can skip the installer and setup these paths on your own. This manual will assume that you are using the installer.

To run the installer, use the installRLGlue() function from within Matlab in the main directory of the package you downloaded. This will suggest installing Matlab to your home directory at ~/rl-glue/codecs/matlab. If you would prefer an alternate location, you can call the function with a path, like:

	>>installRLGlue('~/desired/path/to/codec')

Removing the Codec

The Matlab codec can be uninstalled by deleting the directory that you installed it
(eg. ~/rl-glue/codecs/matlab), and using the Matlab path editor to remove the associated directories from your path.

Sample Project

We have included two example projects with this codec, located in the examples directory. Each project contains an agent, environment, and experiment written for this Matlab codec. The two projects are skeleton and mines-sarsa-sample.

The skeleton contains all of the bare-bones plumbing that is required to create an agent / environment / experiment with this codec and might be a good starting point for creating your own components.

The mines-sarsa-sample contains a fully functional tabular Sarsa learning algorithm, a discrete-observation grid world problem, and an experiment program that can run these together and gather results. More details below in Section 2.6.

In the following sections, we will describe the skeleton project. Running and using the mines-sarsa-sample is analogous.


Skeleton Agent

We have provided a skeleton agent with the codec that is a good starting point for agents that you may write in the future. It implements all the required functions and provides a good example of how create and run a simple agent.

There are several functions that need to be written. They are all contained in a single Matlab source file:

	examples/skeleton/skeleton_agent.m

The skeleton_agent.m file has a public function that returns a structure with function pointers to all of the required RL-Glue functions. The structure looks like this:

>> theAgent=skeleton_agent()

theAgent = 

       agent_init: @skeleton_agent_init
      agent_start: @skeleton_agent_start
       agent_step: @skeleton_agent_step
        agent_end: @skeleton_agent_end
    agent_cleanup: @skeleton_agent_cleanup
    agent_message: @skeleton_agent_message

Alternatively, these different functions could each be in their own skeleton_agent_{init, start, step, end, cleanup, message}.m files. This is a personal choice.

This agent does not learn anything and randomly chooses integer action 0 or 1.

You can compile and run the agent like:

	>$ cd examples/skeleton
	>$ theAgent=skeleton_agent();
	>$ runAgent(theAgent);

skeleton_agent() creates a struct with function pointers to the other skeleton_agent methods. runAgent(theAgent) then connects to RL-Glue and runs one step at a time until RL-Glue disconnects.

Alternatively, for a more interactive experience, you can run the agent manually one step at a time:

	>$ cd examples/skeleton/
	>$ theAgent=skeleton_agent()
	>$ connectAgent(theAgent);
	>$ runAgentLoop(theAgent);    %run one step
	>$ runAgentLoop(theAgent);    %run one step
	>$ runAgentLoop(theAgent);    %run one step
	   ...

Using this method, you can stop and examine what your agent is learning, and potentially modify, visualize, or analyze it however you like.

You will see something like:

	>> runAgent(theAgent)
	RL-Glue Matlab Agent Codec Version: 1.0 ($Revision: 688 $)
	    Connecting to rl_glue at host: localhost on port 4096

This means that the skeleton_agent is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!

You can kill the process by pressing CTRL-c on your keyboard.

The Skeleton agent is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.


Skeleton Environment

We have provided a skeleton environment with the codec that is a good starting point for environments that you may write in the future. It implements all the required functions and provides a good example of how to compile a simple environment. This section will follow the same pattern as the agent version (Section 2.1). This section will be less detailed because many ideas are similar or identical.

The pertinent file is:

	examples/skeleton/skeleton_environment.m

This environment is episodic, with 21 states, labeled $\{0, 1,\ldots,19,20\}$. States {0, 20} are terminal and return rewards of {-1, +1} respectively. The other states return reward of 0. There are two actions, {0, 1}. Action 0 decrements the state number, and action 1 increments it. The environment starts in state 10.

You can compile and run the environment like:

	>$ cd examples/skeleton
	>$ theEnv=skeleton_environment();
	>$ runEnvironment(theEnv);

You will see something like:

	RL-Glue Matlab Environment Codec Version: 1.0 ($Revision: 688 $)
	    Connecting to rl_glue at host: localhost on port 4096

This means that the skeleton_environment is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!

You can kill the process by pressing CTRL-c on your keyboard.

The Skeleton environment is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.

Skeleton Experiment

We have provided a skeleton experiment with the codec that is a good starting point for experiments that you may write in the future. It implements all the required functions and provides a good example of how to compile a simple experiment. This section will follow the same pattern as the agent version (Section 2.1). This section will be less detailed because many ideas are similar or identical.

The pertinent files are:

	examples/skeleton/skeleton_experiment.m

This experiment runs RL_Episode a few times, sends some messages to the agent and environment, and then steps through one episode using RL_step.

	>$ cd examples/skeleton
	>$ skeleton_experiment();

You will see something like:

	Experiment starting up!
	RL-Glue Matlab Experiment Codec Version: 1.0 ($Revision: 688 $)
	    Connecting to rl_glue at host: 127.0.0.1 on port 4096

This means that the skeleton_experiment is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!

You can kill the process by pressing CTRL-c on your keyboard.

The Skeleton experiment is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.

Running All Three Components Together

At this point, we've run each of the three components, now it's time to run them with the rl_glue executable server. As of version 1.03, the Matlab codec supports running any combination of agent, environment, and experiment, all from a single Matlab instance. Note that although they are all in Matlab, they are still running over a local network connection to talk to the rl_glue executable socket server. In a future version, we may provide a local RL-Glue implementation so that they can work together without needing sockets. This capability already exists in both the RLGlueCore project for C/C++ and the RL-Glue Java Extension codec.

The following will work if you have the rl_glue socket server is installed at /usr/local/bin/rl_glue.

We will run all of the components in the Matlab interpreter. Alternatively, it may be convenient to run rl_glue in a terminal window of its own. In the Matlab interpreter:

	>> cd examples/skeleton
	%Start the rl_glue socket server as a background process
	>> !/usr/local/bin/rl_glue &
	>> runAllTogether();

If RL-Glue is not installed in the default location, you'll have to start the rl_glue executable server using its full path (unless it's in your PATH environment variable):

	>> /path/to/rl-glue/bin/rl_glue &

In the Matlab window, you should see the following if it worked:

	>> runAllTogether
	RL-Glue Matlab Agent Codec Version: 1.0 ($Revision: 688 $)
		Connecting to rl_glue at host: 127.0.0.1 on port 4096
		Agent Codec Connected
	RL-Glue Matlab Environment Codec Version: 1.0 ($Revision: 688 $)
		Connecting to rl_glue at host: 127.0.0.1 on port 4096
		Environment Codec Connected
	RL-Glue Matlab Experiment Codec Version: 1.0 ($Revision: 688 $)
		Connecting to rl_glue at host: 127.0.0.1 on port 4096
		Experiment Codec Connected
	Experiment starting up!
	RL_init called, the environment sent task spec: VERSION RL-Glue-3.0 
	PROBLEMTYPE episodic DISCOUNTFACTOR 1.0 OBSERVATIONS INTS (0 20)  
	ACTIONS INTS (0 1)  REWARDS (-1.0 1.0)  
	EXTRA skeleton_environment(Matlab) by Brian Tanner.


	----------Sending some sample messages----------
	Agent responded to 'what is your name?' with: 
	my name is skeleton_agent, Matlab edition!
	Agent responded to 'If at first you don't succeed; call it version 1.0  ' 
	with: I don\'t know how to respond to your message
	Environment responded to 'what is your name?' with: 
	my name is skeleton_environment, Matlab edition!
	Environment responded to 'If at first you don't succeed; 
	call it version 1.0  ' with: 
	I don\'t know how to respond to your message


	----------Running a few episodes----------
	Episode 0	 48 steps 	 -1.000000 total reward	 natural end 1
	Episode 1	 100 steps 	 0.000000 total reward	 natural end 0
	Episode 2	 74 steps 	 -1.000000 total reward	 natural end 1
	Episode 3	 34 steps 	 -1.000000 total reward	 natural end 1
	Episode 4	 100 steps 	 0.000000 total reward	 natural end 0
	Episode 5	 1 steps 	 0.000000 total reward	 natural end 0
	Episode 6	 70 steps 	 1.000000 total reward	 natural end 1


	----------Stepping through an episode----------
	First observation and action were: 10 and: 1


	----------Summary----------
	It ran for 86, total reward was: -1.000000

Congratulations, you have run an RL-Glue agent / environment / experiment together, all using the Matlab codec!

Notes on Running Agents, Environments, and Experiments Together

There is a function called runRLGlueMultiExperiment that facilities running any combination of agent, environment, and experiment together within Matlab.

This function takes a struct as a parameter, and the struct can have any combination of the following three fields set (at least one should be set):

agent
An agent struct, like the one that is created by skeleton_agent.
environment
An environment struct, like the one that is created by skeleton_environment.
experiment
An experiment function pointer, like @skeleton_experiment.

As long as you set one more of these fields, then you can call runRLGlueMultiExperiment and the Matlab codec will make sure that everything gets called when it needs to. Don't forget to make sure you run any components that you don't specify with another codec or another Matlab instance!


Going Further - Mines Sarsa Example Project

The skeleton sample project is extremely limited and only shows the mechanics of how RL-Glue components are structured using the Matlab codec. The mines-sarsa sample project is much richer.

More details about the mines-sarsa sample project can be found at their RL-Library home:
http://library.rl-community.org/packages/mines-sarsa-sample


Sample-Mines-Environment

The mines environment is internally a two-dimensional, discrete grid world where the agent receives a penalty per step until reaching a goal state, hopefully without stepping on any exploding land-mines along the way. The (x,y) state is flattened into a discrete, scalar observation for the agent. This environment can receive special messages from the experiment program to print the current state to the screen, and also to toggle between random starting states and a fixed starting-state specified by the experiment.

The task specification string3 is created in a semi-automated way using the Java RL-Glue Extension task spec parser/builder.


Samples-Sarsa-Agent

The SARSA agent is a tabular learning agent that uses $\epsilon-greedy$ exploration as described in Reinforcement Learning: An Introduction by Sutton and Barto.

The SARSA agent parses the task specification string using the Java RL-Glue Extension task spec parser. This agent can receive special messages from the experiment program to pause/unpause learning, pause/unpause exploring, save the current value function to a file, and load the the value function from a file.

Sample-Experiment

The sample experiment program runs the show. First, it alternates running the agent in the environment for a number of episodes, and telling the agent to pause learning so that the current performance can be evaluated. These results are plotted with error-bars in a Matlab figure.

The sample experiment then tells the agent to save the value function to a file, and then resets the experiment (and agent) to initial conditions. After verifying that the agent's initial policy is bad, the experiment tells the agent to load the value function from the file. The agent is evaluated again using this previously-learned value function, and performance is dramatically better.

Finally, the experiment sends a message to specify that the environment should use a fixed (instead of random) starting state, and runs the agent from that fixed start state for a while.

Who creates and frees memory?

The RL-Glue technical manual has a section called Who creates and frees memory?. The general approach recommended there is to make a copy of data you want to keep beyond the method it was given to you. The same rules of thumb from that manual should be followed when using the Matlab codec. The observations and actions can be copied using the same techniques as the RL-Glue Java Extension.

Advanced Features

Task Specification Parser

As of fall 2008, we've updated the task specification language:
http://glue.rl-community.org/Home/rl-glue/task-spec-language

The Matlab codec uses the task spec parser implementation from the RL-Glue Java Extension. This task spec parser/builder can be used in environments to create task specification strings for env_init. The sample mines environment in Section 2.6.1 provides an example of creating a task spec in this way. There are also several advanced examples of this in the RL-Library4. The task spec parser/builder can also be used by agents to decode the task spec string for agent_init. The sample sarsa agent in Section 2.6.2 demonstrates how to do this.

Connecting on custom ports to custom hosts

This section will explain how to set custom target IP addresses (to connect over the network) and custom ports (to run multiple experiments on one machine or to avoid firewall issues). Sometimes you will want run the rl_glue server on a port other than the default (4096) either because of firewall issues, or because you want to run multiple instances on the same machine.

In these cases, you can tell your Matlab agent, environment, or experiment program to connect on a custom port and/or to a custom host using the RL_set_port() and RL_set_host() Matlab functions.

For example, the following code:

>> RL_set_port(4097);
>> RL_set_host('yahoo.ca')
>> cd examples/skeleton
>> skeleton_experiment();

That command could give output like:

	RL-Glue Matlab Experiment Codec Version: 1.0 ($Revision: 688 $)
	     Connecting to rl_glue at host: yahoo.ca on port 4097

This works for agents, environments, and experiments. In practice though, remember that yahoo.ca probably isn't running an RL-Glue server.

You can specify the port, the host, neither, or both. Ports must be numbers, hosts can be hostnames or ip addresses. Default port value is 4096 and host is 127.0.0.1.

Remember, on most *nix systems, you need superuser privileges to listen on ports lower than 1024, so you probably want to pick one higher than that.

Codec Specification Reference

This section will explain how the RL-Glue types and functions are defined for the Matlab codec. This isn't meant to be the most exciting section of this document, but it will be handy.

Since the Matlab is built on top of the Java codec, many of the underlying data structures are from the Java codec. We recommend checking out the Java documentation (PDF) (HTML) (JAVADOC) for more information.

Types

Simple Types

Unlike the C/C++ codec, we will not be using typedef statements to create special labels for the types. Since Matlab is loosely typed, these things aren't so hard and fast:


Structure Types

All of the major structure types (observations, actions) come off the network as the appropriate object from the Java codec. The Java codec manual should have all the information required to understand those objects.

So in a given Matlab method, like
function theAction=skeleton_agent_step(theReward, theObservation), theObservation is actually of type: org.rlcommunity.rlglue.codec.types.Observation.

Java and Matlab play very well together, so you can do things like:

	>> testObs=org.rlcommunity.rlglue.codec.types.Observation();
	>> testObs.intArray=[1 2 3 4];
	>> testObs.doubleArray=[0.1 0.5];
	>> testObs.charArray='fun things!';
	>> testObs.toString()

	ans =

	numInts: 4
	numDoubles: 2
	numChars: 11
	 1 2 3 4 0.1 0.5 f u n   t h i n g s !

Functions

Agent Functions

All agent constructor functions should set the same functions as our Skeleton agent.

Useful utility methods for connecting, disconnecting, and running with the rl_glue executable server are in the agent directory of the Matlab codec source.

Environment Functions

All environment constructor functions should set the same functions as our Skeleton environment.

Useful utility methods for connecting, disconnecting, and running with the rl_glue executable server are in the environment directory of the Matlab codec source.

Experiments Functions

All experiments can call the methods in the glue directory. In this case we'll include their prototypes, because the source file is full of implementation details.

Frequently Asked Questions

We're waiting to hear your questions!

Where can I get more help?

Online FAQ

We suggest checking out the online RL-Glue Matlab Codec FAQ:
http://glue.rl-community.org/Home/Extensions/matlab-codec#TOC-Frequently-Asked-Questions

The online FAQ may be more current than this document, which may have been distributed some time ago.

Google Group / Mailing List

First, you should join the RL-Glue Google Group Mailing List:
http://groups.google.com/group/rl-glue

We're happy to answer any questions about RL-Glue. Of course, try to search through previous messages first in case your question has been answered before.

Credits and Acknowledgements

Brian Tanner wrote the Matlab codec. He is also responsible for creating the installer, which is pretty nifty. Yay Brian.

Contributing

If you would like to become a member of this project and contribute updates/changes to the code, please send a message to rl-glue@googlegroups.com.

Document Information

Revision Number: $Rev: 688 $
Last Updated By: $Author: brian@tannerpages.com $
Last Updated   : $Date: 2009-02-09 12:13:44 -0700 (Mon, 09 Feb 2009) $
$URL: https://rl-glue-ext.googlecode.com/svn/trunk/projects/codecs/Matlab/docs/MatlabCodec.tex $

About this document ...

RL-Glue Matlab Codec 1.03 Manual

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html MatlabCodec.tex -split 0 -dir html -mkdir -title 'RL-Glue Matlab Codec' -local_icons -math

The translation was initiated by Brian Tanner on 2009-02-09


Footnotes

... RL-Glue1
http://glue.rl-community.org/
... Apache-2.02
http://www.apache.org/licenses/LICENSE-2.0.html
... string3
http://glue.rl-community.org/Home/rl-glue/task-spec-language
... RL-Library4
http://library.rl-community.org

next_inactive up previous
Brian Tanner 2009-02-09