occupancy example

This commit is contained in:
Michael Pilosov 2022-05-26 14:25:42 +00:00
parent d3ac4f575d
commit dc8848b895
4 changed files with 3877 additions and 0 deletions

588
occupancy/Draft.ipynb Normal file

File diff suppressed because one or more lines are too long

588
occupancy/Edit.ipynb Normal file

File diff suppressed because one or more lines are too long

35
occupancy/README.md Normal file
View File

@ -0,0 +1,35 @@
# Occupancy Classifier
The goal here is to predict a binary outcome: whether a room is occupied by people or not.
To do this, we will use two time-series based features: humidity and light measurements.
## Learning Outcome
We present here a very "messy" notebook with almost no documentation (one that was actually written by the author and re-visited for the purposes of this pedagogical example) based on a very limited dataset found online.
The goal of this demonstration is to show an example of "un-shareable code" and how a very simple re-factor can dramatically improve the presentation of the content.
The author of the [code in the] notebook is often the first person to benefit from such refactors, since revisiting the content in the future can largely feel like reading something for the first time, especially with enough time elapsed.
By making small changes such as variable names, titles and labels in plots, and some brief comments, what started out as a rough proof-of-concept can become a foundation for future work building a predictive product.
For example, imagine a smart thermostat equipped with an inexpensive photoresistor to measure light and a moisture meter which needs to "learn" when people are home so as to best conserve energy when they are away.
A notebook like the one presented here could very much be how such someone began investigating how to implement such a feature.
## How to Read this Example
First start by opening the [Draft.ipynb](./Draft.ipynb) notebook and running it.
- Notice how the environment isn't specified. The kernel the notebook is expecting may or may not exist on your machine.
- You may run into import errors because certain packages don't exist. How do you handle it?
- Do you open a terminal up and run `pip install <missing-package>` each time you encounter an import error?
- Do you create a new cell at the top of the notebook with lines such as `# pip install <missing-package-list> > /dev/null && echo "installed dependencies"` to handle installation?
- Do you create a `requirements.txt` file and list the dependencies you will need in there so you can at least run `pip install -r requirements.txt`?
- Do you specify versions for any of the above?
- Notice that variable names are often a single letter. Is this easy to read?
- Notice that graphs are missing labels
- Notice the lack of annotations: what is visualization v.s. data-exploration v.s. training v.s. testing and how this complicates reading the notebook from top-to-bottom.
- The data is committed to `git` (granted it is a small file), do you see any potential future problems that can arise if this becomes the norm in a mature shared repository?
Then open [Cleanup.ipynb](./Cleanup.ipynb) and observe the changes made, reflect on the following:
- What has been addressed and what is still missing? (Is what is missing worth doing?)
- How much more readable is it?

2666
occupancy/datatest.csv Normal file

File diff suppressed because it is too large Load Diff