mm
2 years ago
1 changed files with 16 additions and 0 deletions
@ -0,0 +1,16 @@ |
|||
# city-transformers |
|||
|
|||
Generates dataset of cities (US only for now) and their geodesic distances. |
|||
Uses that dataset to fine-tune a neural-net to understand that cities closer to one another are more similar. |
|||
Distances become `labels` through the formula `1 - distance/MAX_DISTANCE`, where `MAX_DISTANCE=20_037.5 # km` represents half of the Earth's circumfrence. |
|||
|
|||
There are other factors that can make cities that are "close together" on the globe "far apart" in reality, due to political borders. |
|||
Factors like this are not considered in this model, it is only considering geography. |
|||
|
|||
However, for use-cases that involve different measures of distances (perhaps just time-zones, or something that considers the reality of travel), the general principals proven here should be applicable (pick a metric, generate data, train). |
|||
|
|||
A particularly useful addition to the dataset here: |
|||
- airports: they (more/less) have unique codes, and this semantic understanding would be helpful for search engines. |
|||
- aliases for cities: the dataset used for city data (lat/lon) contains a pretty exhaustive list of aliases for the cities. It would be good to generate examples of these with a distance of 0 and train the model on this knowledge. |
|||
|
|||
see `Makefile` for instructions. |
Loading…
Reference in new issue