Skip to content

Memory usage during phase 2 flex processing #1535

@mboeringa

Description

@mboeringa

Based on @pnorman's good work for the flex version of the openstreetmap-carto style (openstreetmap-carto/openstreetmap-carto#4431), I have been experimenting with planet size data to see the results and what potential issues might come up with the new options.

I have done two tests:

  • Process a 67 GB planet PBF file based on the Facebook "Daylight" distribution (v1.2), with additionally added Microsoft ML buildings as made available by Facebook, and the "administrative boundaries" as well. The combined PBF file was made using osmium per the Facebook website's instructions.
  • Process the official 58 GB planet OSM file.

For the first test, involving the Facebook Daylight, I used an early version of the flex file of openstreetmap-carto, where Paul didn't yet add the 'planet-osm-admin', 'planet-osm-transport-line' and 'planet-osm-transport-polygon' tables, but only enhanced it with a non-spatial 'planet-osm-route' table that can be used to display routes based on database joins.

My VM was configured with 100 GB RAM, and 50 GB Swap in Ubuntu. Peak memory usage during this first test was about 95 GB RAM + 9 GB swap (with "-C 75000" set on the command line), so this first run went successfully, despite the large PBF.

I then ran the second test with the smaller 58 GB official planet PBF file. This time, I used the latest state of the Paul's work on the flex style, which adds the new 'planet-osm-admin', 'planet-osm-transport-line' and 'planet-osm-transport-polygon' spatial tables, next to the 'planet-osm-route' non-spatial table.

With the same VM configuration, the osm2pgsql process was killed when all RAM and swap was consumed. I then attempted it with smaller cache settings, but even with "-C 10000", the process was killed. Switching to slim-mode and flatnodes file allowed the processing to succeed.

Now my question:

As far as I now understand Paul's code in the flex style, only the 'planet-osm-admin' table actually requires phase 2 processing, and all the other tables are just created using phase 1 processing.

As documented on the "Osm2pgsql Manual" pages, phase 2 processing can add a considerable amount of extra memory usage, due to the need to store in main memory all data from phase 1 needed in phase 2 ("All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.").

Clearly, with the "administrative boundaries", we have a kind of "worst case" scenario here, as the admin boundary relations are some of the most complex and largest ones in the whole of OpenStreetMap. So it is probably not a surprise to see a big jump in memory usage.

Yet, seeing the memory usage of osm2pgsql jump up by > 40 GB before being killed, still seems a bit to much, even for the admin boundaries??

If I understand the LUA code of the style right, only the member ways of the boundary relations need to be stored, and since the ways are even being de-duplicated in the process (this was the purpose of the new 'planet-osm-admin' spatial table), additional memory usage should be relatively modest??

What am I missing? Or is this kind of memory usage in this kind of usage scenario involving OpenStreetMap "administrative boundaries" and phase 2 processing just simply expected and normal?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions