update sd assumption

update docs
2022-03-20 20:00:00 -06:00 · 2022-03-20 19:59:52 -06:00
2 changed files with 47 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,46 @@
 # mud-games

-control systems with MUD points
+control systems with MUD points
+
+# installation
+
+```bash
+pip install -r requirements.txt
+```
+
+
+# usage
+
+A `data.pkl` file is provided for your convenience with input / output samples.
+
+The inputs are the parameters to a `4x1` matrix which is multiplied against the observations of the state in order to make a decision for the next action (push left or right). The output of the vector inner-product is binarized by comparison to zero as a threshold value.
+
+The parameter space is standard normal.
+There is no assumed error in observations, so the "data variance" is designed to reflect the acceptable ranges for the parameters:
+
+From [gym](https://www.gymlibrary.ml/pages/environments/classic_control/cart_pole):
+- The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
+- The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)
+
+
+The target "signal" is zero for all four dimensions of the observation space. The presumed "data variance" should actually correspond to the acceptable bands of signal (WIP).
+
+```bash
+python main.py
+```
+
+# generate data
+
+You can generate your own data with:
+```bash
+python data.py
+```
+
+Note: if you change the presumed sample space in `data.py`, you should make the corresponding changes to the initial distribution in `main.py`.
+
+
+# improvements
+
+Using the following presumptions, we can establish better values for the "data variance":
+The angular momentum of the pole is the most important thing to stabilize.
+
--- a/main.py
+++ b/main.py
@ -10,7 +10,8 @@ from scipy.stats import norm

 def train(data):
    D = pd.DataFrame(data)
-    D["qoi"] = D["obs"].apply(lambda o: np.sum(o, axis=0) / np.sqrt(len(o)))
+    sd = np.array([1.0, 0.25, 0.5, 0.1])
+    D["qoi"] = D["obs"].apply(lambda o: np.sum(o, axis=0) / sd / np.sqrt(len(o)))
    D["i"] = D["lam"].apply(lambda l: norm.pdf(l).prod())
    D["o"] = D["qoi"].apply(lambda q: norm.pdf(q).prod())
    Q = np.array(D["qoi"].to_list()).reshape(-1, 4)
@ -19,6 +20,7 @@ def train(data):
    D["u"] = D["i"] * D["o"] / D["p"]
    mud_point_idx = D["u"].argmax()
    mud_point = D["lam"].iloc[mud_point_idx]
+    print(f"MUD Point ({mud_point_idx}: {mud_point}")
    return mud_point


@ -44,5 +46,4 @@ def test(decision=np.array([-0.09, -0.71, -0.43, -0.74]), seed=1992):
 if __name__ == "__main__":
    data = pickle.load(open("data.pkl", "rb"))
    mud_point = train(data)
-    print(f"MUD Point: {mud_point}")
    test(mud_point)
Author	SHA1	Message	Date
Michael Pilosov	a8c0ad14ac	update sd assumption	2022-03-20 20:00:00 -06:00
Michael Pilosov	9d6cc4c15e	update docs	2022-03-20 19:59:52 -06:00