-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #78 from hotosm/docs/algorithm
Add docs about splitting algorithm currently + future
- Loading branch information
Showing
2 changed files
with
286 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,285 @@ | ||
# The AOI Splitting Algorithm | ||
|
||
For now (as of 19-12-2024), this algorithm is entirely focused on splitting | ||
polygon features such as buildings. | ||
|
||
## How It Works | ||
|
||
!!! note | ||
|
||
For ease of understanding, I will replace the word 'feature' | ||
with 'building' in the following description. But the word | ||
'building' could in theory be substituted by any feature type. | ||
|
||
### 1. Split AOI By Linear Features | ||
|
||
- First we divide up the AOI into polygons, based on the provided | ||
bisecting features such as roads, rivers, and railways. | ||
- To do this we: | ||
- Polygonize the linear features. | ||
- Centroid the features to make sure they only get counted in one | ||
splitpolygon. | ||
- Clip by the AOI polygon. | ||
- We get the database table `polygonsnocount`. | ||
- Polygons with zero or too few features are merged into neighbours. | ||
|
||
**Output**: Polygons covering the AOI area. | ||
|
||
[image here] | ||
|
||
### 2. Group Buildings By Polygon | ||
|
||
- Next we take our buildings, insert into db table `buildings`, then find | ||
the polygon each building centroid is located within. | ||
- Now we create the next table `splitpolygons` where we add the count of | ||
the total number of buildings within each polygon, plus the polygon | ||
area. | ||
|
||
**Output**: Building dataset tagged with their containing polygon's ID. | ||
|
||
[image here] | ||
|
||
### 3. Cluster The Buildings | ||
|
||
- For each polygon containing buildings, we pass through a K-Means clustering | ||
algorithm, to output X number of clusters. | ||
- X is calculated as: | ||
|
||
```bash | ||
(T / A) + 1 | ||
T - Total building count | ||
A - Average number of buildings desired per cluster | ||
``` | ||
|
||
!!! info | ||
|
||
K-Means will group buildings based on their spatial proximity, ideally | ||
grouping together buildings that are close together (reducing walking | ||
distance for mappers in the field). | ||
|
||
- We create a table `clusteredbuildings` where we have the original buildings, | ||
plus their assigned cluster ID from K-Means. | ||
- The field `clusteruid` is a composite of `polygonid '-' clusterid`. | ||
- The `clusterid` is specific to the containing polygon (starting from 0). | ||
- For example polygon `377` will have IDs `377-0`, `377-1`, `377-2`. | ||
|
||
!!! tip | ||
|
||
Using K-Means, we should be aware: | ||
|
||
- Edge cases: sparse areas may create clusters with few buildings, | ||
and dense areas could result in many overlapping clusters. | ||
- Trial and error: the clustering quality depends on fine-tuning the | ||
average number of buildings per cluster. | ||
|
||
**Output**: Building dataset tagged with their containing polygon's ID, | ||
plus a cluster ID specific to the polygon. | ||
|
||
[image here] | ||
|
||
### 4. Enclose The Buildings | ||
|
||
!!! info | ||
|
||
We previously used a Voronoi based approach: | ||
|
||
1. Densify the buildings to reduce the impact of long edges | ||
(maximum edge 0.00004 degrees). | ||
2. Dump the building polygons into points. | ||
3. Create a Voronoi diagram (a technique to divide up the points within | ||
an area into polygons, where each final polygon contains the closest | ||
'neighbour' points from the clusters in the previous step). | ||
This approach had some flaws, so we have attempted other approaches, below. | ||
|
||
- Divide up each cluster into polygons using convex hulls. | ||
- Here we essentially form small 'islands' of buildings. | ||
- Fixing polygon overlaps: | ||
- We may have a few polygon overlaps, where a building could fall between | ||
two polygon areas. | ||
- To solve this, we find all of the overlapping 'shards', and subtract | ||
from the polygon area. | ||
- We then union all de-overlapped hulls with their buildings (the | ||
de-overlapping will have left some feature polygons partially and | ||
maybe wholly outside of their home polygons, this should restore | ||
them without creating new overlaps unless the features themselves | ||
are overlapping, in which case there isn't a great solution). | ||
|
||
**Output**: Task polygons that don't overlap and fully enclose all features, | ||
that don't have jagged / complex edges. | ||
|
||
[image here] | ||
|
||
### 5. Filling The Negative Space | ||
|
||
- Now we have our 'islands' of buildings, this would be fine if we knew | ||
for certain that they contained every possible building in the field. | ||
- However, in the real world, we may have missed some buildings, so the | ||
task areas should be expanded out to cover the entire AOI footprint. | ||
- We can create a 'negative' space multipolygon by subtracting the hulls | ||
from the AOI. | ||
- The 'negative' space multipolygon can be filled using the | ||
'straight skeleton' algorithm: | ||
- This is essentially a Voronoi algorithm, but for polygons | ||
instead of points! (not exactly, but it's an analogy) | ||
- The algorithm will work on the edges and corners of the 'hull' | ||
polygons, to generate bounding 'filler' polygons between them. | ||
- It will perfectly bisect between buildings or polygon areas, | ||
instead of creating wavy / zig-zag boundaries. | ||
- Finally, we identify the edge-sharing neighbor hull of each element | ||
of the polygonized skeleton, dissolve them into those neighbors. | ||
|
||
!!! info | ||
|
||
- Voronoi diagrams divide space based on distances to points or | ||
polygons, creating regions with perpendicular bisectors. | ||
- Straight skeletons shrink polygon edges inward at equal speed | ||
to create a network of lines (skeleton) and subdivided polygons. | ||
It’s more about preserving the shape of polygons rather than | ||
distance-based partitioning. | ||
|
||
**Output**: Split task area polygons. | ||
|
||
> The original buildings, including tag data, can be superimposed on | ||
to these task polygons, to assign them to each task area. | ||
|
||
[image here] | ||
|
||
### 6. Alignment Of Task Areas (Optional) | ||
|
||
- The final problem here is aligning the polygon areas back with the | ||
linear features, as they may have shifted slightly during all the | ||
processing! | ||
- For example the task boundaries should ideally align in the | ||
center of a highway polylin. | ||
- Using a window function, we can essentially run the same steps | ||
as above, but for each specific cluster area, instead of the | ||
whole AOI, reducing the drift from the linear features. | ||
|
||
**Output**: Split task area polygons, better aligned to linear features. | ||
|
||
[image here] | ||
|
||
## Requirements & Future Plans | ||
|
||
Input from Ivan Gayton @ 18/12/2024 | ||
|
||
### Input Datasets | ||
|
||
#### AOI | ||
|
||
- Do some GeoJSON cleanup, CRS checking, and normalise to a specific format. | ||
- Ideally we just have a Polygon GeoJSON. | ||
|
||
#### Splitting Lines (Linear Features) | ||
|
||
- Allow polyline input from sources other than OSM. | ||
- From OSM: | ||
- Polylines: default all, but user configurable (major vs minor highways, etc). | ||
- Polygons: filter tags for traffic circles, water bodies, etc, then split into | ||
polylines. | ||
|
||
In both cases, we likely only need the geometry, no tags. | ||
|
||
#### Map Features | ||
|
||
- **Points**: | ||
- Geometries, plus tags. | ||
- Centroids of polygons. | ||
- Midpoint of polylines. | ||
|
||
- **Polylines**: | ||
- Geometries, plus tags. | ||
- Convert relevant polygons such as traffic circles / water bodies into | ||
polylines. | ||
- Split roads at all intersections, so that every polyline constitutes an | ||
edge in a graph. | ||
|
||
- **Polygons**: | ||
- Geometries, plus tags. | ||
- Convert multipolygons (like OSM buildings with holes) into simple polygons for | ||
the purpose of splitting (maybe we want the multipolygons to send to the data | ||
collection app later, but for splitting we definitely don't want holes). | ||
- Do some checking/cleaning for invalid geometries. | ||
|
||
### Output Datasets | ||
|
||
We need the following datasets of geometries, but probably not any tags | ||
associated: | ||
|
||
- AOI | ||
- Splitlines | ||
- Features | ||
|
||
The original features should probably be retained for later use in the | ||
actual data collection (e.g. conflation), but for splitting purposes we | ||
don't need all the tags and fields, and we definitely don't want any | ||
complex geometry. | ||
|
||
### Extra Work | ||
|
||
Large task polygons: | ||
|
||
- We might need to check for task polygons that are too big. | ||
- If there are regions with sparse features, they might end | ||
up with giant task polygons, in which case we might want to | ||
sub-split them (maybe into squares). | ||
|
||
## Testing The Workflow In QGIS | ||
|
||
### Running Bleeding-Edge PostGIS Locally | ||
|
||
First you may need to set up a local Postgres database, including the latest | ||
bleeding-edge version of PostGIS and SFCGAL. | ||
|
||
The easiest way is via Docker (single command): | ||
|
||
```bash | ||
docker run --name aoi-splitting-db --detach \ | ||
-p 5432:5432 -v ./db_data:/var/lib/postgresql/data/ \ | ||
-e POSTGRES_USER=hotosm -e POSTGRES_PASSWORD=hotosm -e POSTGRES_DB=splitter \ | ||
docker.io/postgis/postgis:17-master \ | ||
&& sleep 5 \ | ||
&& docker exec aoi-splitting-db psql -d splitter -U hotosm -c \ | ||
'CREATE EXTENSION IF NOT EXISTS postgis_sfcgal WITH SCHEMA public;' | ||
``` | ||
|
||
The instance will be available: | ||
|
||
- Host: `localhost` | ||
- Port: `5432` | ||
- Database: `splitter` | ||
- User `hotosm` | ||
- Password `hotosm` | ||
|
||
!!! NOTE | ||
|
||
Changing the port on the left side in the command `8888:5432`, | ||
will make Postgres available on a different port for you. | ||
|
||
### Importing OSM Data | ||
|
||
Get the raw-data-api Lua OSM import script: | ||
|
||
```bash | ||
curl -LO https://raw.githubusercontent.com/hotosm/osm-rawdata/refs/heads/main/osm_rawdata/import/raw.lua | ||
``` | ||
|
||
Download some data from GeoFabrik: | ||
|
||
<https://download.geofabrik.de> | ||
|
||
Import into Postgres: | ||
|
||
```bash | ||
osm2pgsql --create -H localhost -U hotosm -P 5432 -d splitter \ | ||
-W --extra-attributes --output=flex --style ./raw.lua \ | ||
/your-geofabrik-file.osm.pbf | ||
``` | ||
|
||
### QGIS DB Manager | ||
|
||
- Install the QGIS DB Manager plugin. | ||
- View the geoms you just added. | ||
- Run each step of the splitting algorithm SQL sequentially. | ||
- View the intermediary tables and geometries created. | ||
- More detailed instructions to come! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters