Working Group Minutes/EWG 2014-10-06

From OpenStreetMap Foundation

Attendees

IRC nick Real name
apmon Kai Krueger
pnorman Paul Norman
zere Matt Amos

Summary

  • osm2pgsql
    • Discussed extra requirements of the C++ branch: boost & indexing.
      • Boost is required for Mapnik, so not much issue with including it as a dependency.
      • Performance on indexing is slower, but "going over pending" is much faster. On most machines expect a net performance increase.
    • alex85k has a working patchset for Windows: https://github.com/openstreetmap/osm2pgsql/issues/17#issuecomment-58012362
    • Will re-visit next week.

IRC Log

17:34:47 <zere> minutes of the last two meetings: http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2014-09-22 and http://www.osmfoundation.org/wiki/Working_Group_Minutes/EWG_2014-09-15
17:35:31 <zere> on the agenda tonight: moving to fortnightly meetings (i.e: every other week), and https://github.com/openstreetmap/osm2pgsql/pull/187
17:35:41 <zere> #topic fortnightly meetings
17:36:33 <zere> basically, of the last 14 meetings, we've had 8 of them...
17:37:04 <zere> which kind of suggests that the remainder, which were either extremely short or didn't happen at all, weren't necessary
17:39:46 <zere> and, from the deafening chorus of "no"s from Firefishy, pnorman and TomH, i'm going to go out on a limb and say that no one really cares?
17:41:18 <pnorman> hrm
17:42:00 <pnorman> the last person other than me or you to speak here was tomh, in aug
17:43:19 <zere> so there's a wider question of whether it's worth even having these, if only two of us had anything to say in the past month
17:44:09 <pnorman> I had put osm2pgsql C++ on the tentative agenda but no one has shown up
17:46:41 <zere> indeed. but i don't know what to do about that...
17:47:38 <pnorman> we could wrap up the WG? You put out a wider call at one point, right?
17:48:49 <zere> what do you mean, a wider call?
17:49:12 <pnorman> dev@
17:49:36 <zere> yeah, but to be fair, i think it was quite a long time ago
17:51:06 <zere> perhaps enough discussion is happening nowadays on github that EWG meetings are superfluous? i always thought having a real-time part was useful, but perhaps it's too onerous given the range of timezones we're all scattered over
17:52:02 <zere> and, to be fair, the effectiveness of EWG is something we've discussed in the past. although, usually there were more people to discuss it with ;-)
17:54:12 <pnorman> we should discuss c++, even if it's just us
17:56:20 <apmon> yes
17:58:52 <pnorman> So, both me and Matt have worked on various parts of it.
17:59:17 <pnorman> The areas worth discussing are probably a) boost, b) indexing performance
18:01:55 <apmon> What needs discussing about them?
18:02:17 <pnorman> a) boost. you need boost for a modern tile server *anyways*. it is a bit of a pain, but I think it's unquestionably worth it, and we kept compatibility with the 12.04 version of boost
18:02:48 <apmon> boost in general I don't think is an issue. As you said, it is necessary for mapnik anyway
18:03:08 <zere> yup. boost is almost a standard library for c++
18:03:08 <apmon> The problem I had with mapnik was that it always needed a more modern boost than was available.
18:03:21 <zere> indeed, bits of it are in the C++11 standard library ;-)
18:03:26 <apmon> So if it runs with boos that comes with 12.04, I think having a dependency on boost is much less problematic
18:03:51 <pnorman> there are some odd things going on with the last phase performance, but it's not a huge performance issue, and that section needs rewriting to not do the stupid ORDER BY stuff anyways
18:04:27 <pnorman> by odd, I mean, I'm not sure what's going on.
18:04:36 <zere> what's the latest performance result? is the new branch faster in some, most or all cases?
18:05:29 <pnorman> first two phases unquestionably.
18:06:37 <pnorman> On extremely high IO concurrency machines I'm getting odd results. see https://github.com/openstreetmap/osm2pgsql/pull/187#issuecomment-57613985
18:06:53 <apmon> where does the performance increase come from? Aren't the first two phases nearly identical to prior to the port?
18:07:03 <pnorman> we're doing pending in memory
18:07:18 <apmon> Did that matter even in the first two phases?
18:07:29 <apmon> Btw. what do you mean by first two phases?
18:07:36 <pnorman> processing and pending
18:07:39 <apmon> Initial import and "going over pending ways"?
18:07:55 <apmon> ah, yes, then the memory pending would make a difference.
18:08:07 <apmon> Do you see any changes in the first phase? Or just the second?
18:08:48 <pnorman> both. processing is slower on cpu-bound nodes, faster on ways/relations. pending is much faster
18:09:58 <zere> worth a TODO to profile and find where the nodes is spending its time - we should (at some point in the future, i'm not suggesting this for pre-merge) be able to get it down to the same performance as before
18:10:25 <pnorman> probably worth a rewrite of the PBF code to use the C++ PBF libraries
18:11:42 <apmon> I think most of the nodes stuff is indeed in the front end parsing code. But then on a fast machine, isn't the nodes import time dwarfed by everything else. So doesn't matter too much
18:11:54 <zere> yeah, could drop the dependency on protobuf-c-compiler.
18:12:41 <zere> yup, it doesn't matter much, so i'd not bother with it before the merge. just worth making a note of it because it'll probably turn out to be something fixable.
18:12:48 <pnorman> on a fast or slow machine, nodes time is dwarfed. maybe on a machine with lots of *very* slow CPU cores and absurdly fast IO it could be up to maybe 10% of overall
18:13:05 <zere> okay, so in general performance is good except for the indexing?
18:13:09 <pnorman> yes
18:13:46 <pnorman> and even then, I'd expect pending ways boosts to make up for it on most systems.
18:14:19 <pnorman> the c++ branch does have a backwards compatible schema change, as it gets rid of the pending columns in the slim tables
18:14:45 <pnorman> well, it no longer creates them, actual removal is up to the user
18:17:38 <pnorman> given how widely osm2pgsql is used, I was kind of hoping for more interest than we got. I mean, we got some, but osm2pgsql is a core component to 98% of rendering setups out there
18:19:33 <zere> oh, don't worry - there'll be loads of interest when it gets merged and we find a bug that we missed earlier ;-)
18:20:23 <pnorman> So, as a plan, tag the current release, then merge next week?
18:20:49 <apmon> What is the state of review of the code?
18:20:56 <apmon> I presume it passes all the tests?
18:21:02 <apmon> Did you get to expand the test suit?
18:21:22 <pnorman> We have more unit tests now. It passes *my* regression tests too
18:21:39 <apmon> Both pnorman and zere, you are happy with the state of code and feel it is ready for merging?
18:21:43 <pnorman> I am
18:22:20 <apmon> Sounds like a plan then!
18:22:38 <zere> yup. at this point we're not finding any more bugs. i'm never really happy with the state of any code, but i think it should be merged and we'll clean up any bugs which fall out when other people start using it.
18:23:11 <apmon> I'll try my best to get around to looking at it in detail this week. But if the two of you think it is thoroughly tested and are happy with it, then I think that should be good enough
18:24:20 <zere> awesome. please do have a look at it. anything that you think needs making clearer, commenting better, just say so and we'll try to get it in as good shape as possible
18:25:31 <pnorman> I'm sure there are regressions somewhere - osm2pgsql is too complex to not have them, but at this point, we can't find them, and the overall level of bugs is down with all the stuff we did
18:27:57 <apmon> If people start complaining about broken stuff, at least we then know where our test coverage is lacking...
18:28:06 <zere> indeed :-)
18:28:17 <zere> #topic AoB
18:28:19 <apmon> I suspect, a big issue might be the build system?
18:28:33 <pnorman> We've got it building on CI, but not `make check`
18:28:50 <zere> well, we've stayed with autotools. and `make check` should work
18:29:00 <apmon> so as long as it compiles, hopefully it gives correct output, but it might be that a bunch of systems where it used to compile now longer does without fixes?
18:29:03 <zere> (with a little setup beforehand)
18:29:08 <pnorman> oh, it'll someone with more access rights than me to enable travis on openstreetmap/osm2pgsql
18:29:37 <pnorman> apmon: freebsd works, as does ubuntu 12.04, 14.04, centos 6 (eww). windows still needs a patchset, but it takes a smaller one
18:29:39 <zere> alex85k has a parallel build system using Cmake, which might be worth considering
18:29:40 <apmon> Is that a feature github offers?
18:30:03 <pnorman> apmon: either I need admin or you need to register on travis-ci.org
18:30:44 <zere> it's not offered by github (iirc), but a 3rd party. but there is some sort of github integration
18:31:34 <pnorman> between freebsd, ubuntu and centos I think we've got reasonable coverage. I'm sure there's some obsecure system out there that no longer works thanks to boost, but centos 6 is pretty old itself.
18:31:45 <apmon> I suspect that kind of things need to be done by TomH, as he is afaic the one with the full priviledges to the github project
18:32:26 <apmon> I could test it on centos 5 and Ubuntu 9.04, given I have still access to those. But I am not sure we want or need to support that old systems
18:32:34 <zere> and alex85k has merged a bunch of stuff in - i think apart from the Cmake itself, it all builds on windows: https://github.com/openstreetmap/osm2pgsql/issues/17#issuecomment-58012362
18:32:50 <apmon> that sounds like great progress!
18:33:14 <pnorman> I don't want to be held back by centos 5 and 9.04. I mean, if there's fixes that can be made without causing problems on modern systems, that's fine, but that'd be post-merge
18:34:15 <apmon> yes, that sounds very reasonable!
18:34:43 <pnorman> after all, it's not as if they can't use an old version on their old system anyways
18:35:41 <apmon> Yes, it would be more for curiosity than to actually support it.
18:36:28 <pnorman> I care more about OS X and Windows than those old ones - and we should figure out what to do with the cmake stuff which makes it easier for Windows
18:37:49 <apmon> Ah, yes, OS X. What is the status there?
18:38:37 <pnorman> Doesn't work on C, doubt it works out of the box on C++
18:39:31 <pnorman> C++ will probably make it easier as more OS-specific details will be abstracted
18:41:51 <zere> we did merge a bunch of that stuff
18:42:31 <zere> so i don't think there's a massive gap between cpp_conversion and whatever will build on windows. it's pretty close.
18:45:01 <zere> cool, was there anything anyone wanted to discuss?
18:45:24 <pnorman> I think that's it. So, timeline?
18:50:29 <zere> well, apmon said "I'll try my best to get around to looking at it in detail this week."
18:50:40 <pnorman> k
18:50:50 <zere> so i guess we'll either talk about it again, or celebrate next monday :-)
18:51:06 <zere> which is reason enough to have another meeting next week, i guess
18:51:16 <zere> so... hope to see you then :-)
18:51:18 <zere> thanks