Working Group Minutes/EWG 2013-04-29

From OpenStreetMap Foundation

Attendees

IRC nick Real name
apmon Kai Krueger
Firefishy Grant Slater
gravitystorm Andy Allan
pnorman Paul Norman
TomH Tom Hughes
zere Matt Amos

Summary

  • Carto style
    • pnorman has been trying some benchmarking on the OSM dev server, but there's too much variation in the results to get a meaningful baseline.
    • apmon cited a previous 30% slowdown result, but dating from 3rd Feb so possibly missing improvements made to the style in March.
    • Attempts were made to find a suitable benchmarking machine, but OSMF does not have one spare in its inventory. pnorman is looking into alternatives.
  • READMEs
    • ACTION gravitystorm to have a go at making the rails_port README better.

IRC Log

17:01:07 <zere> welcome. minutes of the last meeting: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2013-04-22
17:01:29 <zere> let me know if there's anything which needs correcting.
17:01:43 <zere> on the agenda today: carto style & documentation.
17:01:51 <zere> #topic carto style
17:02:42 <zere> pnorman__ sent me an email earlier about some experiments he's been doing with renderd on errol.
17:03:01 <zere> apparently the standard deviation on a 1 hour test is 1 hour.
17:03:45 <zere> so it does appear that errol's variation in disk use makes benchmarking on there particularly difficult.
17:03:50 <apmon> the load on errol seems very variable
17:03:53 * gravitystorm digs through my email accounts
17:04:53 <zere> pnorman__ says he's going to try on EC2, which should give more stable results - even if the disk latency on EC2 tends to be rather terrible
17:06:26 <apmon> Are there no other test systems available?
17:06:51 <gravitystorm> zere: ah, found the email, on the tileserving list. Can I backtrack and ask what the context is - is it just a comparison of xml vs carto styles, or something more precise?
17:07:19 <apmon> afaik just a simple comparison for now
17:07:34 <apmon> but on a typical work load derived from yevauds logs
17:08:00 <gravitystorm> If the former, then I'd suggest just doing benchmarks on a laptop or regular PC - it has a more typical CPU/disk/memory balance than EC2, albeit much less than on a tileserver
17:08:44 * pnorman checks in
17:08:52 <apmon> If you run it on a typical workload, you do need a full planet loaded though
17:08:54 <gravitystorm> EC2 is somewhat skewed towards memory/cpu
17:09:56 <zere> indeed. but i think that would tend to exaggerate any difference rather than hide it?
17:10:18 <pnorman> I was attempting to do a compairson, but haven't run carto yet as I doubt I'll get anything statistically significant on errol
17:10:19 <zere> not that i really know - kind of the point of benchmarking is to figure this stuff out.
17:10:22 <gravitystorm> apmon: true, but I'm sure one could either filter the metatiles to only a particular region, or else take a less rigourous approach (e.g. ignore <z6 since there are so few of them, and only really worry about the differences based on the proportion of tiles served)
17:10:46 <pnorman> zere: not if the difference is in queries (disk seeks)
17:11:13 <pnorman> gravitystorm: the list of metas is from a log excerpt from yevaud, so the balance of low-zoom is the same as it is for yevaud
17:11:23 <apmon> zere: If I am not mistaken, the carto style also needs quite a bit more CPU
17:11:45 <zere> i'm expecting carto to have more queries, more disk seeks and therefore comparatively suck more on EC2... is that not right?
17:11:56 <apmon> I got the 30% difference figure, when rendering just Karlsruhe on a germany extract, which probably fit into memory cache
17:12:10 <gravitystorm> pnorman: I was aiming to ignore lowzoom mainly because they a) take forever b) aren't very many of them and c) screw up testing if you're only using a regional extract :-)
17:12:24 <pnorman> i mean, really, the only way to be sure what resources it needs is to stick on a machine equivalent to yevaud and load it up equivalent to yevaud
17:13:05 <gravitystorm> zere: I'm expecting the same number of queries to be honest, the main difference would be around the filter-first performance versus match-all approach in the xml symoblizers
17:13:19 <gravitystorm> pnorman: sure.
17:13:57 <pnorman> so, does anyone have alternative suggestions to EC2? my benchmarking requires a full planet osm2pgsql db, renderd+mapnik
17:14:00 <apmon> As it still seems that yevaud is more CPU bound the disk bound, even with its single outdated SSD, that might be rather relevant
17:14:16 <zere> i think we're only looking to figure out some rough number. basically whether putting it on yevaud, in its current state, is going to kill it.
17:15:14 <apmon> On the otherhand, if it is CPU bound, there probably are some spare CPUs in some of the not used servers one could use.
17:15:43 <TomH> is that even the plan though, or is the plan to put it on orm (with SSDs)
17:15:48 <TomH> I've kind of lost track....
17:16:04 <zere> well, Firefishy has his plan, but i think his plan is crazy ;-)
17:16:12 <pnorman> is there a spare machine which could be used to benchmark which has a better cpu/memory/disk balance for testing?
17:16:36 * Firefishy catches up.
17:16:58 <gravitystorm> and if we upgrade mapnik at the same time, the performance increase (or decrease) might swamp any stylesheet-related ones :-)
17:17:53 <apmon> What is Firefishy's crazy plan?... ;_0
17:18:40 <pnorman> gravitystorm: true - doing a DB reload, OS upgrade, postgresql upgrade, postgis upgrade, mapnik upgrade and stylesheet change all at the same time could drastically change the requirements
17:18:58 <apmon> gravitystorm: I haven't redone the benchmarks recently, but a 30% increase in rendering time (if it is still the case) would likely be significant
17:19:05 <apmon> even when moving over to orm
17:20:48 <gravitystorm> apmon: Do you have a version number or git commit from that 30% figure?
17:22:20 <zere> pnorman: i just had a quick look, and i don't see anything spare with enough disk to handle a rendering database.
17:22:23 <apmon> let me check what date I posted that comment on github
17:22:34 <apmon> as it was a fresh checkout from that day
17:22:37 <pnorman> zere: even with slim tables dropped?
17:23:14 <gravitystorm> if it was prior to v2.2.0 (March 13) then things might have changed - that was moving away from attachments for the road layers
17:23:14 <zere> pnorman: good point - how big is it with slim tables dropped?
17:23:18 <pnorman> zere: although I guess you need the space for slim tables before you drop them...
17:23:20 <apmon> https://github.com/gravitystorm/openstreetmap-carto/issues/20#issuecomment-13058442 so it was three months ago
17:23:50 <apmon> 2013-02-03, so yes it was
17:25:18 <apmon> I should probably try another equivalent small scale benchmark on my laptop
17:25:48 <apmon> but I am having trouble getting carto installed, as node.js dependencies are preventing me from getting carto running
17:25:50 <Firefishy> "my plan" is to add SSD to orm, make it primary renderer... then reinstall yevaud to make it additional renderer. Both "full master". Idris was used by TomH to test/create chef scripts for setting up openstreetmap-carto
17:26:15 <gravitystorm> apmon: awesome. Another benchmark would be really useful - if you do so, then per-zoom breakdowns are really useful (it's always z12-16 that are the ones to focus on)
17:26:43 <apmon> render_list now does spit out per zoom level info, so that should be easy
17:26:54 <pnorman> zere: 71GB for a full planet from a couple weeks ago
17:26:54 <gravitystorm> apmon: also, side-by-side comparisons from the output of the mapnik debug logs gives an idea of which layers/queries are worth focussing on.
17:27:05 <zere> yeah, i was looking at idris, but it only has 64GB disk.
17:27:07 <apmon> although for the simple benchmarks, one can simply render each zoom separately.
17:27:14 <zere> Firefishy: what state is azure in?
17:27:50 <apmon> It simply reflects the state of that everything is likely in cache if you render a bbox systematically
17:28:04 <Firefishy> zere: If could lift it... it would be out the window. It is x86 and sh*t.
17:28:12 <zere> and, in the third conversation thread, why i think Firefishy's plan is crazy is the "full master" bit with no shared cache.
17:28:41 <pnorman> if azure is racked and working it should easily handle it, osm2pgsql is less demanding than pgsnapshot
17:28:55 * apmon agrees and would like to see a more distributed approach
17:29:02 <gravitystorm> apmon: I wouldn't worry too much about caching, since we're looking more at a CPU-bound setup on the tileservers anyway
17:29:06 <zere> i know it's shit, but it has 24GB RAM and decent sized disks. if the 32-bitness isn't going to massively impact the benchmarks, it might be appropriate.
17:29:32 <zere> and then i'll help you "lose" it in a nearby rubbish bin ;-)
17:29:33 <pnorman> oh, 32 bit? ugh... does that even work anymore with nodes over 2^31?
17:29:56 <apmon> It might make it difficult to import the full planet, as osm2pgsql doesn't like 32 bit very much
17:30:06 <Firefishy> zere: not going to happen. azure is missing disks (I've cannibalised it already). I can get disks for idris rather.
17:30:07 <zere> slim mode should? x86 does support uint64_t, just rather slowly
17:30:08 <apmon> pnorman: It should still work, as the node ids will still be in 64 bit
17:30:29 <apmon> but you will pretty much have no node cache during initial import
17:30:35 <zere> Firefishy: cool. maybe that's the way forward then.
17:31:53 <pnorman> looking at http://www.ec2instances.info/ (ec2 list with sane formatting) might a high-memory double extra large with provisioned EBS storage work?
17:32:12 <Firefishy> I'll try get the disks sorted by later this week.
17:34:03 <pnorman> might it be more effective to devote sysadmin time to getting a two-server render setup going? we know we'll need to move to it eventually, what we're not certain on is if switching to carto will kill yevaud if we switch before then
17:34:10 <apmon> pnorman: As yoy will probably need it for more than 48 hours, you are nearly better off, just getting a server at hetzner or ovh for a month
17:34:32 <zere> pnorman: it might do. i'd still be worried that amazon's idea of "high" i/o performance corresponds to http://upload.wikimedia.org/wikipedia/commons/b/b4/IBM_350_RAMAC.jpg
17:35:40 <pnorman> apmon: hetzner has a setup fee, total is 100 euros for 1 month
17:35:53 <apmon> ovh, I think doesn't
17:36:03 <pnorman> zere: I hear with provisioned EBS you can actually get decent iops performance
17:36:54 <zere> that would be very interesting
17:37:15 <pnorman> apmon: how little disk space do you think you can get away with for a system? ovh has a SSD setup with 2x120GB
17:38:13 <pnorman> zere: a big part of me wants to learn how to use EC2, so if the costs are equivalent I wouldn't mind using EC2
17:38:44 <apmon> My guess would be it is slightly above that. Although with --flatnodes and --drop, you might get it close
17:39:51 <pnorman> I know you're below 120GB *after* import, but during import is my concern
17:39:52 <zere> slightly flippant suggestion, but you could try an older planet?
17:40:25 <zere> i know it won't be the same, but it'll be smaller and we're only interested in comparative results
17:41:26 <pnorman> should work. we'll only get comparative anyways. slightly different data density distributions might impact it, but that shouldn't be too bad. I was looking at http://www.ovh.ie/dedicated_servers/sp_32g_ssd.xml
17:42:10 <zere> e.g: the 05-Jan-2012 planet is 30% smaller than the most recent. but your results would be cc-by-sa :-P
17:44:49 <zere> ok, i reckon there's a few ways forward here, and i look forward to seeing what the results are. i'd also like to talk about documentation / READMEs / etc...
17:44:53 <zere> #topc READMEs
17:44:59 <zere> #topic READMEs
17:45:31 <zere> open season on ideas about what needs to be improved with the current rails_port README
17:45:46 <pnorman> carto install instructions suck. also, I couldn't get carto running on errol
17:45:46 <pnorman> oh, different readmes
17:46:04 <TomH> pnorman: oh I have that all working in my tile server cookbook
17:46:07 <zere> worth making a note of anyway. we'll want to get around to that eventually
17:46:42 <gravitystorm> zere: so I read over the rails port readme earlier. Seems like the second half could be split into a CONTRIBUTING.md or similar
17:47:30 <zere> all the bits about coding style, testing, etc... yup. if that's a reasonably standard place for them
17:47:30 <gravitystorm> Also the rails port readme needs more focus on installation. Unfortunately the installation instructions (on the wiki) are way, way to verbose
17:47:33 <apmon> pnorman: If you have a out-to-date compiled carto style, could you send it to me?
17:47:41 <apmon> up-to-date
17:48:15 <zere> gravitystorm: yeah, we touched on that last time. my opinion is that it needs to have something, but no more than a few lines and a link to "here's (way) more information (than you wanted)"
17:48:40 <apmon> Then I can quickly run the benchmark comparison on the germany extract as an initial indication of if things have improved
17:48:44 <pnorman> apmon: mine is a couple weeks old and the shapefile paths are hard-baked in, so you'd have to edit them unless your username is pnorman
17:48:56 <gravitystorm> for example, it lists bundler as dependencies, then someone else has added a (platform specific) 'how to install bundler' further down the page. There's even stuff there about what error messages appear when rubygems.org is having a hiccup
17:49:52 <zere> so... anyone fancy having a crack at improving the README?
17:50:10 <apmon> pnorman: I had to change db names and everything anyway. find and replace in a text editor is your friend...
17:50:20 <pnorman> apmon: http://pnorman.dev.openstreetmap.org/osm-carto.xml
17:50:34 <apmon> thanks
17:50:38 <gravitystorm> zere: well, another approach would be to copy-paste the wiki into INSTALL, then start viciously removing stuff until it's a bit more sane. But that's only worth doing if there's a scorched-earth approach to the wiki pages
17:51:04 * pnorman always welcomes scorched-earth and the wiki
17:53:03 <zere> gravitystorm: i was thinking more than the README install docs could be "you've got ruby installed, right? if not, see http://www.ruby-lang.org/en/downloads/. now `gem install bundler; bundler install; rake db:migrate; rake test`
17:53:29 <zere> ... if you have any problems, see TROUBLESHOOTING.md"
17:53:40 <zere> which then points to the wiki for the gory details
17:54:05 <zere> if people are like me, then they tend to google the error message anyway, in which case they'll land on the wiki.
17:54:14 <gravitystorm> zere: well, that's where I would like to head with the docs. Unfortunately the rails port is full of stuff (like db functions) that need a bit more explanation.
17:54:16 <apmon> you still need a couple of more commands to set up the postgresql db and applications.yml
17:54:36 <apmon> but otherwise, yes, there really isn't much to setting up rails_port
17:54:51 <pnorman> well, loading the damn database takes a month
17:55:06 <gravitystorm> zere: well, I'd like to move away from using the wiki for technical documentation, since we've got the well-observed emergent behaviour of "shit documentation" when we rely on it.
17:55:45 <pnorman> I don't use the wiki for my projects documentation, and I insist that any pull requests adding features add docs at the same time
17:56:22 <gravitystorm> Do we have a target platform for the installation notes? Is it reasonable to assume Ubuntu is the normal base, and other platforms are exceptions?
17:56:50 <zere> gravitystorm: i think we only rely on bizzaro db functions for the changes stuff, which should be removed anyway. with PG 9.2 extensions, the rest (btree_gist) is pretty easy
17:57:46 <pnorman> ubuntu is the normal base. i found out that freebsd ports actually has far more recent versions of many GIS components than ubuntu's repos
18:00:21 <zere> cool. anyone (gravitystorm?) feel like volunteering to have a go at it?
18:00:33 <gravitystorm> sure, I'm happy to have a go
18:00:56 <zere> #action gravitystorm have a go at making the rails_port README better
18:00:58 <zere> thanks!
18:01:25 <zere> we're at the top of our hour, but does anyone else have anything they'd like to discuss?
18:01:35 <gravitystorm> Is there any consensus about moving all of http://wiki.openstreetmap.org/wiki/Rails_port into the git repo? I'm in favour, but I won't do it if it's a bad idea
18:01:42 <pnorman> +1 to that
18:02:26 <pnorman> could we get carto on errol?
18:02:54 <zere> gravitystorm: only if it's heavily edited?
18:03:31 <gravitystorm> zere: certainly
18:04:23 <apmon> Might be something we could put on the agenda for next week, but having another go at figuring out how to translate help.osm.org would be good
18:05:26 <zere> yup. i'll stick that up for next week.
18:07:22 <pnorman> I'm going to try importing with osm2pgsql and a M3 2xlarge EC2, I'll see how that goes
18:07:32 <zere> awesome :-)
18:07:59 <pnorman> my benchmarks are likely to be carto with latest software vs osm.xml with latest software. I wonder what impact having the latest software is
18:08:43 <zere> yeah, one would hope that it's as fast or faster than the older software, but it isn't always the case.
18:09:06 <zere> i guess that's a wrap. thanks to everyone for coming, and i hope to see you next week! :-)