Browse Source

improve docs

feature/notebook
Michael Pilosov 3 years ago
parent
commit
45ce364166
  1. 17
      README.md

17
README.md

@ -13,21 +13,24 @@ pip install -r requirements.txt
A `data.pkl` file is provided for your convenience with input / output samples.
The inputs are the parameters to a `4x1` matrix which is multiplied against the observations of the state in order to make a decision for the next action (push left or right). The output of the vector inner-product is binarized by comparison to zero as a threshold value.
```bash
python main.py
```
# info
The inputs are the parameters to a `1x4` matrix which is multiplied against the observations of the state in order to make a decision for the next action (push left or right). The output of the vector inner-product is binarized by comparing it to zero as a threshold value.
The parameter space is standard normal.
There is no assumed error in observations, so the "data variance" is designed to reflect the acceptable ranges for the parameters:
There is no assumed error in observations; the "data variance" is designed to reflect the acceptable ranges for the parameters:
From [gym](https://www.gymlibrary.ml/pages/environments/classic_control/cart_pole):
- The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
- The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)
The target "signal" is zero for all four dimensions of the observation space. The presumed "data variance" should actually correspond to the acceptable bands of signal (WIP).
```bash
python main.py
```
Therefore, since our objective is to stabilize the cart, the target "time series signal" is zero for all four dimensions of the observation space. The presumed "data variance" should actually correspond to the acceptable bands of signal (WIP).
# generate data

Loading…
Cancel
Save