History

Michael Pilosov ae89e4d318 adding RL demo		2022-05-26 15:29:41 +00:00
..
data.pkl	adding RL demo	2022-05-26 15:29:41 +00:00
DemoGym.ipynb	adding RL demo	2022-05-26 15:29:41 +00:00
DemoMUD.ipynb	adding RL demo	2022-05-26 15:29:41 +00:00
headless.sh	adding RL demo	2022-05-26 15:29:41 +00:00
main.py	adding RL demo	2022-05-26 15:29:41 +00:00
README.md	adding RL demo	2022-05-26 15:29:41 +00:00
requirements.txt	adding RL demo	2022-05-26 15:29:41 +00:00
sample.py	adding RL demo	2022-05-26 15:29:41 +00:00

README.md

PREFACE

This is a direct migration (stripping git history) of mud-games (as of commit 1a2259827f) which shows an actual research-oriented experiment which involves a novel method of "training" (this mud stuff) and "testing" (visually). The intent was to explore a utility library named gym which provides a consistent interface with which to train reinforcement-learning algorithms, and try to "learn to win" one of its most basic games (Cartpole-V1).

Takeaways from this example:

much more friendly for reproducibility
runs on desktop AND in notebook (handling visual output is tricky, leverage the patterns here if you need to move interactive outputs into the cloud)
functions defined in main.py are "clean" but still not "clear"
notice the lack of documntation: where would it be helpful to have it?
data is not only supplied (perhaps not good to commit it) but a method to generate it is also provided (takes some time)
notice the comprehensive README below

mud-games

control systems with MUD points

installation

pip install -r requirements.txt

usage

A data.pkl file is provided for your convenience with input / output samples.

python main.py

You can also instead use the included jupyter notebook.

info

The inputs are the parameters to a 1x4 matrix which is multiplied against the observations of the state in order to make a decision for the next action (push left or right). The output of the vector inner-product is binarized by comparing it to zero as a threshold value.

The parameter space is standard normal. There is no assumed error in observations; the "data variance" is designed to reflect the acceptable ranges for the observations:

The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)

Therefore, since our objective is to stabilize the cart, the target "time series signal" is zero for all four dimensions of the observation space. The presumed "data variance" should actually correspond to the acceptable bands of signal (WIP).

generate data

You can generate your own data with:

python sample.py

Note: if you change the presumed sample space in data.py, you should make the corresponding changes to the initial distribution in main.py.

improvements

Using the following presumptions, we can establish better values for the "data variance":

The angular momentum of the pole is the most important thing to stabilize.

headless mode / notebook demos

Run ./headless.sh (requires sudo) to install virtual displays so you can use the included Jupyter notebooks.