Compare commits
No commits in common. "a8c0ad14acfef82bdd50c207092e14f1b19eb0c4" and "fe8a5ee1d7621bf8cc21dcd59fc60d1d7516b971" have entirely different histories.
a8c0ad14ac
...
fe8a5ee1d7
43
README.md
43
README.md
@ -1,46 +1,3 @@
|
|||||||
# mud-games
|
# mud-games
|
||||||
|
|
||||||
control systems with MUD points
|
control systems with MUD points
|
||||||
|
|
||||||
# installation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
# usage
|
|
||||||
|
|
||||||
A `data.pkl` file is provided for your convenience with input / output samples.
|
|
||||||
|
|
||||||
The inputs are the parameters to a `4x1` matrix which is multiplied against the observations of the state in order to make a decision for the next action (push left or right). The output of the vector inner-product is binarized by comparison to zero as a threshold value.
|
|
||||||
|
|
||||||
The parameter space is standard normal.
|
|
||||||
There is no assumed error in observations, so the "data variance" is designed to reflect the acceptable ranges for the parameters:
|
|
||||||
|
|
||||||
From [gym](https://www.gymlibrary.ml/pages/environments/classic_control/cart_pole):
|
|
||||||
- The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
|
|
||||||
- The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)
|
|
||||||
|
|
||||||
|
|
||||||
The target "signal" is zero for all four dimensions of the observation space. The presumed "data variance" should actually correspond to the acceptable bands of signal (WIP).
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python main.py
|
|
||||||
```
|
|
||||||
|
|
||||||
# generate data
|
|
||||||
|
|
||||||
You can generate your own data with:
|
|
||||||
```bash
|
|
||||||
python data.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: if you change the presumed sample space in `data.py`, you should make the corresponding changes to the initial distribution in `main.py`.
|
|
||||||
|
|
||||||
|
|
||||||
# improvements
|
|
||||||
|
|
||||||
Using the following presumptions, we can establish better values for the "data variance":
|
|
||||||
The angular momentum of the pole is the most important thing to stabilize.
|
|
||||||
|
|
||||||
|
5
main.py
5
main.py
@ -10,8 +10,7 @@ from scipy.stats import norm
|
|||||||
|
|
||||||
def train(data):
|
def train(data):
|
||||||
D = pd.DataFrame(data)
|
D = pd.DataFrame(data)
|
||||||
sd = np.array([1.0, 0.25, 0.5, 0.1])
|
D["qoi"] = D["obs"].apply(lambda o: np.sum(o, axis=0) / np.sqrt(len(o)))
|
||||||
D["qoi"] = D["obs"].apply(lambda o: np.sum(o, axis=0) / sd / np.sqrt(len(o)))
|
|
||||||
D["i"] = D["lam"].apply(lambda l: norm.pdf(l).prod())
|
D["i"] = D["lam"].apply(lambda l: norm.pdf(l).prod())
|
||||||
D["o"] = D["qoi"].apply(lambda q: norm.pdf(q).prod())
|
D["o"] = D["qoi"].apply(lambda q: norm.pdf(q).prod())
|
||||||
Q = np.array(D["qoi"].to_list()).reshape(-1, 4)
|
Q = np.array(D["qoi"].to_list()).reshape(-1, 4)
|
||||||
@ -20,7 +19,6 @@ def train(data):
|
|||||||
D["u"] = D["i"] * D["o"] / D["p"]
|
D["u"] = D["i"] * D["o"] / D["p"]
|
||||||
mud_point_idx = D["u"].argmax()
|
mud_point_idx = D["u"].argmax()
|
||||||
mud_point = D["lam"].iloc[mud_point_idx]
|
mud_point = D["lam"].iloc[mud_point_idx]
|
||||||
print(f"MUD Point ({mud_point_idx}: {mud_point}")
|
|
||||||
return mud_point
|
return mud_point
|
||||||
|
|
||||||
|
|
||||||
@ -46,4 +44,5 @@ def test(decision=np.array([-0.09, -0.71, -0.43, -0.74]), seed=1992):
|
|||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
data = pickle.load(open("data.pkl", "rb"))
|
data = pickle.load(open("data.pkl", "rb"))
|
||||||
mud_point = train(data)
|
mud_point = train(data)
|
||||||
|
print(f"MUD Point: {mud_point}")
|
||||||
test(mud_point)
|
test(mud_point)
|
||||||
|
Loading…
Reference in New Issue
Block a user