Working Group Minutes/EWG 2013-12-16

From OpenStreetMap Foundation

Attendees

IRC nick Real name
apmon Kai Krueger
gravitystorm Andy Allan
pnorman Paul Norman
shaunmcdonald Shaun McDonald
zere Matt Amos

Summary

  • osm2pgsql threading
    • pnorman gave an update on a fix to the threading branch for issues with tag corruption. It is not thought to affect benchmark results.
    • pnorman noted: FYI to anyone running a rendering server with updates, I suggest reindexing planet_osm_ways, we found out that there's an index that comes out of the import at 3.5GB, and after reindexing goes to 8kb
    • there was a discussion of imposm3 and interest in benchmarking it against osm2pgsql to understand the performance / feature trade-offs.
  • xmas holidays
    • As many will be busy over the holiday period, the next meeting will be on the 6th Jan 2014.

IRC Log

17:31:39 <zere> hello everyone, last meeting minutes at http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-12-09 and please let me know if there's anything in them which needs to be changed.
17:32:22 <zere> actions are probably the same as previous meetings. once again, i have to apologise for not getting the blog post done - it's still on my TODO. gravitystorm?
17:32:35 <gravitystorm> nothing :-(
17:33:09 <zere> i will note that, if anyone wants to help, then i'm sure both gravitystorm and i would be very happy to receive it.
17:34:35 <zere> pnorman: worth talking about that osm2pgsql-thread-tagging issue? or is it all under control?
17:34:51 <pnorman> worth updating
17:35:04 <zere> #topic osm2pgsql threading
17:35:15 <pnorman> apmon has made changes that should fix it, but I haven't run it through any tests.
17:35:42 <pnorman> I'm running a test to see how long CLUSTER stays effective, so that takes some time to run
17:35:43 <zere> do you know if the changes are likely to alter the benchmark results?
17:36:56 <pnorman> apmon would know better than I. my guess is that much of the time is spent getting data to/from postgres and it won't change that
17:40:33 <pnorman> I'm not sure what numbers best indicate what is happening as performance degrades over time. I'm collecting pgstattuple info (http://www.postgresql.org/docs/9.1/static/pgstattuple.html#PGSTATTUPLE-COLUMNS) as well as correlation data and performance data, but it's really a case of information overload, with about 150 numbers collected
17:44:27 <pnorman> I'm 09:35 < zere> do you know if the changes are likely to alter the benchmark results?
17:47:26 <pnorman> apmon: ^
17:47:42 <apmon> which changes?
17:47:54 <pnorman> tag processing threading changes
17:48:20 <apmon> probably not. Altough if some of the times you were using de-duplication, it might
17:49:05 <apmon> My guess would be due to it being uninitialised, de-duplication was by default off, which is the correct thing
17:49:42 <pnorman> well, something has changed - or else it's still got the thread unsafe part :)
17:54:17 <pnorman> as an FYI to anyone running a rendering server with updates, I suggest reindexing planet_osm_ways, we found out that there's an index that comes out of the import at 3.5GB, and after reindexing goes to 8kb
17:56:07 <zere> intuitively, i'm thinking that cluster should mean that when postgres lifts a page off disk, more of it should be relevant. i don't see anything about the pages loaded to tuples loaded ratio on that pgstattuple page, though
17:56:22 <shaunmcdonald> pnorman: could the osm2pgsql import auto redo that index as part of the import? Thus saving anyone who is setting up a new DB to have to remember to do that step.
17:57:06 <pnorman> shaunmcdonald: it's been patched to not build that index until the end, but that doesn't change it for anyone who's already imported
17:57:20 <apmon> shaunmcdonald: On my todo list
17:57:27 <shaunmcdonald> :-)
17:57:33 <pnorman> zere: there's three things going on: degredation of cluster, table bloat, and index bloat
17:57:42 <apmon> I intend to move the index creation to after the going over pending ways stage
17:58:03 <apmon> at which point there will be 0 pending ways instead of tens of millions of ways
17:58:15 <apmon> i.e. an index size of 8kb instead of 4GB...
17:59:37 <pnorman> I'm also going to run imposm3 through an import to see how it performs on the same hardware
18:00:09 <apmon> should be interesting
18:00:19 <zere> how easy is it to get an apples-to-apples comparison, though?
18:01:12 <pnorman> you need a .style and an imposm3 mapping file that are equivalent.
18:01:14 <apmon> Even an apples to oranges comparison will give some indication of how well either works
18:02:03 <pnorman> yes - we have no direct compairson of any sort right now
18:02:08 <apmon> I.e. it gives some bounds on what is possible and if one is vastly more efficient than the other
18:06:48 <zere> right, so imposm3 is a strict feature superset of osm2pgsql, then?
18:07:29 <zere> if one can write an imposm3 mapping file equivalent to a .style file
18:08:17 <apmon> it doesn't have diff imports? Or did they implement that by now
18:08:27 <zere> because i know that imposm supports a bunch of generalisation features. just not sure if it has features which map 1:1 with what osm2pgsql has.
18:08:48 <pnorman> imposm3 has diff imports
18:10:40 <zere> well, it says "Imposm 3 is much faster than Imposm 2 and osm2pgsql" -- so it would be interesting to see under what conditions that's true.
18:12:37 <zere> "Other missing features: ... Updating generalized tables in diff-mode ... Diff import into custom PG schemas". so, looks like there's some (imho, pretty major) short-comings with imposm3's diff support.
18:13:31 <pnorman> don't think osm2pgsql supports PG schemas either
18:14:31 <zere> you think that means PG schemas? i thought it meant table schemas.
18:14:52 <zere> well, i should say s/PG schemas/namespaces/.
18:15:02 <pnorman> well it does say pg schemas
18:16:28 <pnorman> when the imposm3 docs talk about schemas, they seem to mean pg schemas consistently
18:17:01 <zere> apart from "Custom database schemas: Creates tables for different data types. This allows easier styling and better performance for rendering in WMS or tile services." where it clearly means table schemas.
18:17:51 <zere> i guess you'll find out if you try ;-)
18:21:54 <zere> was there anything else anyone wanted to discuss?
18:24:56 <gravitystorm> not from me
18:29:57 <zere> ok.
18:30:01 <zere> #topic xmas
18:30:26 <zere> the next meeting would be the 23rd... and i'm guessing a lot of us will be busy
18:30:45 <zere> the one after would be the 30th - probably another busy date.
18:31:12 <zere> would anyone like to have meetings on these days, or shall we just say the next one is the 6th January 2014?
18:31:29 <pnorman> maybe we can let meetbot run by itself, it seems to generate plenty :)
18:34:31 <gravitystorm> 6th works for me
18:35:10 <zere> well, meetbot will be here... so even if i'm not and you want to have a meeting then please use it. it shouldn't need any special setup.
18:38:12 <zere> thanks everyone! and happy holidays :-)