scy,
@scy@chaos.social avatar

If you're building a CLI tool that can churn on large amounts of data for hours and you don't implement any kind of progress output, we won't become friends.

(And no, it refuses to work with stdin, else I would've just used pv and be done.)

tokudan,
@tokudan@chaos.social avatar

@scy can you use e.g. lsof to see the offset of the input file that the program is reading?

scy,
@scy@chaos.social avatar

@tokudan Oh! That's really helpful, thanks!

And also scary, because it told me that it's only at 15 % of the input file, after 2 hours, and has produced 115 GB of output already. 😬

tokudan,
@tokudan@chaos.social avatar

@scy Well... I hope your disk has enough space free and you don't need the data urgently? ^^

tokudan,
@tokudan@chaos.social avatar

@scy oh... and check the offset a couple times, maybe it's not reading the input from start to finish...

scy,
@scy@chaos.social avatar

@tokudan I think it will, because it's an OpenStreetMap planet.osm.pbf which pretty much has to be read from start to finish due to the file format.

And yeah, I got 2 TB free on that disk. Good thing I recently upgraded 😅

djh,
@djh@chaos.social avatar

@scy @tokudan ow, are you running this on a planet.pbf file? Any chance this could be a city or country instead? 🙈

scy,
@scy@chaos.social avatar

@djh @tokudan I'm not exactly in a hurry (but it'd be nice to get some progress information nevertheless), and yeah, while I could be running this on an extract, I'd also like to know how it behaves when being fed the whole planet. 🤷‍♂️

So far, my preliminary verdict is "I need to find another way", but I want to test it first anyway.

djh,
@djh@chaos.social avatar

@scy @tokudan got it, is this still the OpenStreetMap geo-coding use case you are running out? What's the tool in question?

dunkelstern,
@dunkelstern@kampftoast.de avatar

@scy if you want to test another geocoder (assuming that is what you’re doing) i have a more space efficient version on my github and with progress output while it’s doing its thing 🙂 https://github.com/dunkelstern/osmgeocoder (implementation is mainly in postgres stored procedures with a small python shim to access it and throw stuff against) if you want to experiment and have questions feel free to ask. (No guarantees the thing works correctly in countries that have no street names like japan)

scy,
@scy@chaos.social avatar

@dunkelstern Interesting, thanks for letting me know!

The thing is, I'm explicitly trying to avoid Postgres (not that I don't like it, I just try to keep the number of daemons low) and would want to use SQLite (or Spatialite) instead.

spatialite_osm_raw creates absolutely humongous files though (7 GB PBF → 166 GB SQLite).

I'm too tired to really think about this right now though, will have a fresh look at it "tomorrow".

Xjs,
@Xjs@chaos.social avatar

@scy Not sure about what top etc. report on Linux, but on macOS there are “total bytes read/written” statistics right in Activity Monitor. Surely there must be a similar thing?

jych,
@jych@chaos.social avatar

@scy or even better implement an Windows 95-style progress bar :D

manawyrm,
@manawyrm@chaos.social avatar

@scy more people need to use SIGHUP1 :)

scy,
@scy@chaos.social avatar

@manawyrm You mean SIGUSR1?

manawyrm,
@manawyrm@chaos.social avatar

@scy LOL, I should go to bed. Yes, of course.

scy,
@scy@chaos.social avatar

@manawyrm Yeah the thing is, the default action for USR1 is to terminate, so I'm not risking sending this to that effing process now, 2 hours into however long it's gonna take…

goes to read the source instead

  • All
  • Subscribed
  • Moderated
  • Favorites
  • linux
  • mdbf
  • DreamBathrooms
  • cisconetworking
  • magazineikmin
  • InstantRegret
  • everett
  • thenastyranch
  • Youngstown
  • rosin
  • slotface
  • khanakhh
  • Durango
  • kavyap
  • ethstaker
  • megavids
  • anitta
  • modclub
  • osvaldo12
  • normalnudes
  • ngwrru68w68
  • GTA5RPClips
  • tacticalgear
  • provamag3
  • tester
  • Leos
  • cubers
  • JUstTest
  • lostlight
  • All magazines