details in readme

This commit is contained in:
mm 2023-05-04 10:22:17 +00:00
parent fab8952d59
commit 47e440b859

View File

@ -13,4 +13,7 @@ A particularly useful addition to the dataset here:
- airports: they (more/less) have unique codes, and this semantic understanding would be helpful for search engines. - airports: they (more/less) have unique codes, and this semantic understanding would be helpful for search engines.
- aliases for cities: the dataset used for city data (lat/lon) contains a pretty exhaustive list of aliases for the cities. It would be good to generate examples of these with a distance of 0 and train the model on this knowledge. - aliases for cities: the dataset used for city data (lat/lon) contains a pretty exhaustive list of aliases for the cities. It would be good to generate examples of these with a distance of 0 and train the model on this knowledge.
see `Makefile` for instructions. # notes
- see `Makefile` for instructions.
- Generating the data took about 13 minutes (for 3269 US cities) on 8-cores (Intel 9700K), yielding 272,0279 records (combinations of cities).
- Training on an Nvidia 3090 FE takes about an hour per epoch with an 80/20 test/train split.