Hacker Newsnew | past | comments | ask | show | jobs | submit | dajobe's commentslogin

There is also the (impossible to google) https://github.com/harelba/q that lets you SQL over CSV and is unix pipeline friendly.


There's also https://github.com/BurntSushi/xsv which is fairly similar in how it works to Miller, though it can also index a CSV file and then future operation will be sped up by utilizing the index.


xsv is awesome, just used it for a mini-project. Thanks for the heads up.


Hi. q's developer here. Thanks for the mention and kind words everyone.

I've considered the searchability issue when deciding on a name for it, but eventually favored the day-to-day minimum-typing short name over better searchability.

Anyway, you can search for "harelba q" in order to find it if needed.

Harel @harelba


Miller looks very useful, and so does q. Thanks to the op and to you for the introductions.


indeed, I hadn't found q. thanks for the intro!


This is great. Wonder why they didn't go with "cq" or something, though.


does it bring any advantages over https://metacpan.org/pod/DBD::CSV? (not being perl is not an advantage).


Well, DBD::CSV is just a driver for DBI, not a command that is "unix pipeline friendly"... So there's that.


Perl is pipeline friendly though ...


DBD::CSV is a library, which you could use to build a command, but doesn't seem to come with one out of the box. q is just a command, but not a library.

One could use DBI + DBD::CSV + Perl to build something similar to q, but that's a batteries not included solution.


Things it doesn't support: symlinks, posix acls (xattrs). The first one makes it a certain failure for archival use. The hardcoded link to an external crypo service keybase makes it a failure for long term use.


"The final archive format" is a very big promise that 4q doesn't keep right now. It falls short of 7z, RAR and tar.xz, and certainly isn't ready to replace them at the moment.

I'm not too familiar with Coffeescript, but it doesn't seem like a good choice of language to write an archiver. There's no actual draft file format spec I can see, either? But from a first pass, I have the following comments:

Crypto: Encrypted blocks: AES-256-CBC, random IV, with no MAC (!!!). You need to look at that again: that could be a Problem. Hashed blocks: SHA-2-512. Maybe OK (how's length encoded? Look out for extension attacks). That crypto is 14 years old and missing a vital bit: not "modern". Modern choices would include CHACHA20_POLY1305 (faster, more secure, seekable if you do it right); hashes like BLAKE2b (as the new RAR already does); signing things with Ed25519. Look into that kind of thing. You need a crypto overhaul. The keybase.io integration is a nice thought for a UX - but is an online service in invite beta really ready for being baked into an archive format?

Packing: LZMA2 is pretty good: 7z and xz already use that. For a fast algorithm, Snappy is not as good as LZ4, I understand? Neither is the last word in compression. Text/HTML/source code packs much better with a PPM-type model, like PPMd (7z has that, too, as had RAR, but removed it recently), but you need to weigh up the decompression memory usage. ZPAQ's context model mixing can pack tighter, but that's much more intensive and while I like extensibility, I don't like the ZPAQ archive format having essentially executable bytecode.

Other missing features that other archivers have: Volume splitting? Erasure coding or some other FEC? Can you do deltas? (e.g. binary software updates)

You've got some pleasant UX ideas for a command-line archiver (compared to some other command-line archivers!), but sorry, I don't think you're ready for 1.0.


Even better, it's from the future: Posted 25 Mar 2015


I liked the comment: "I’m a big advocate of remote work culture even if nobody works remote."

That really works: respect people's time even if you are in the shared, no-cube, open office world.


You have good taste, since I invented Turtle. To keep this on topic, I've been using JSON for data web APIs since that's what it's best at. It sucks at: markup and graphs of course.


We use JSON for graphs. What sucks about it? In what sense is XML better at representing graphs?


There's no way to point from one part of a JSON doc to another without inventing a terminology or convention for marking the start (anchor) and end of the arc (href). People use 'id' for one end but there's no way to say a json value is actually a reference (href) not just a string. XML has that built in (ID IDREF) and so does HTML, but I didn't say XML was better, I said JSON sucks at markup and graphs. JSON's handy for serializing trees of data with no loops.


What do you mean by "built in"? I don't see how or why an ID couldn't be used in the same manner; the implementation is just a little different because they store data differently.

XML is only a series of nodes and attributes. There isn't really anything else special about it and it's trivial to represent it in JSON so I'm not sure I follow your issue. Could you provide an example?


Holy %!#@, I never knew the inventor of Turtle posted on HN. That's awesome. Turtle does, indeed, rock. But I'm also a Semantic Web Koolaid drinker, so my viewpoints may be a bit out of line with the "mainstream."


How do I vote this polling crap down? Pointless and subjective and if this was stackoverflow, would be deleted.


I partially agree with a slight modification. I write a lot of free software / open source software. This relies on copyright of course, to enforce the freedoms. I don't want an onerous registration environment, but if it was "copyright terms to 4 years after last publication without registration" but registration with a fee after that. So if I continue to make the software, it gets the copyright and if I get bored, it turns into PD - and that's fine.


This article / talk is a collection of relatively random python things of which only a few are unarguably good advice such as not using deprecated terms or checking for exact types. The rest is not a good basis for pythonic best practice that I would recommend.


It is not open sourced, at least not yet. Putting something on github generally means it can be openly distributed. The source code says "all rights reserved" and there is no license file.


What a pathetic rant. I was totally not surprised that day #1 of a total codebase rewrite would not have all the features of the old version. AVOS hinted enough at this too. At least they are innovating and responding to users. Having said that, I do hope 'networks' returns since that's the feature I miss, so far.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: