Hacker Newsnew | past | comments | ask | show | jobs | submit | derriz's commentslogin

This sounds great. parquet-java is extremely unpleasant to use with its massive fan-out of dependencies, an awkward API which exposes these dependencies causing the dependencies to bleed into a user's code base - the Hadoop stuff is particularly annoying given the relatively poor quality (IMO) of the Hadoop code base and the amount of class name sharing with built in Java types (like File, FileSystem, etc.). And the performance of parquet-java is very poor compared to the libraries available to other languages.

Thanks! The heavy dependency footprint of parquet-java was the main driver for kicking off this project. Hardwood doesn't have any mandatory dependencies; any libs for compression algorithms used can be added by the user (most of them are single JARs with no further transitive dependencies) as needed. Same for log bindings (Hardwood is using the System.Logger abstraction).

My experience of public transport modes in various cities is at odds with this.

Trams and trains generally offer far more reliable schedules, frequencies and journey times than busses because they either have completely dedicated alignments or have priority where there is any interface with normal traffic.

Most buses inevitably bunch (see https://setosa.io/bus/ for a nice simulation) and/or get stuck in traffic as a matter of routine. The inconvenience may be less per delay but busses are delayed far more frequently than trams and trains on most of the public transport systems I've used. So for regular users, the cumulative inconvenience is much worse on busses than on trains/trams. Which is why people flock to trains and trams when available as an alternative to busses.

Specifically with regard to the parent, the frequency at which unplanned outages happens with tram services in Zurich is extremely low in my experience - certainly planned changes to schedules or routes (for maintenance, upgrades, etc.) are far more frequent. And when "something happens" (i.e. a traffic accident), the path for trams is cleared as quickly as possible - often in 30 minutes or less - so you'd really have to be unlucky to be inconvenienced by such an occurrence.


lol i was recently in such a situation - tram collision with a car. I got off and decided to walk to the central station... to find out that the tram was near me since they probably recorded everything they needed and it went to the depo...

At a certain point a bias in the prng just has to become significant? This will be a function of the experiment. So I don’t think it’s possible to talk about a general lack of “practical impact” without specifying a particular experiment. Thinking abstractly - where an “experiment” is a deterministic function that takes the output of a prng and returns a result - an experiment that can be represented by a constant function will be immune to bias, while one which returns the nth bit of the random number will be susceptible to bias.

> At a certain point a bias in the prng just has to become significant?

Sure, at a point. I'm not disputing that. I'm asking for a concrete bound. When the state space is >= 2^64 (you're extremely unlikely to inadvertently stumble into a modern PRNG with a seed smaller than that) how large does the sample set need to be and how many experiment replications are required to reach that point?

Essentially what I'm asking is, how many independent sets of N numbers must I draw from a biased deck, where the bias takes the form of a uniformly random subset of the whole, before the bias is detectable to some threshold? I think that when N is "human" sized and the deck is 2^64 or larger that the number of required replications will be unrealistically large.


HM handles sub-typing just fine? Numerous approaches have been known since the 1980s - Michael Wand’s row polymorphism is one such approach.

https://en.wikipedia.org/wiki/Row_polymorphism


Structural subtyping yes, nominal subtyping is a bit pricklier.

As a developer I personally prefer structural subtyping, but structural subtyping is harder for a compiler to optimize runtime performance for.

Nominal sub-type hierarchies allows for members to be laid out linearly and member accesses becomes just an offset whereas a structural system always has the "diamond problem" to solve (it's hidden from users so not a "problem" but will still haunt compiler/runtime developers).

Now the kicker though, in practice nominal subtype polymorphism has other issues for performance on _modern_ computers since they create variable sized objects and cannot be packed linearly like monomorphic structures.

In the 90s when languages settled on nominal typing memory speeds weren't really an huge issue, but today we know that we should rather compose data to achieve data-polymorphic effects and singular types should be directed to packing.

Thus, most performance benefits of a nominal type system over a structural don't help much in real-life code and maintenance wise we would probably have been better off using structural types (iirc Go went there and interfaces in Java/C# achieves mostly the same effect in practice).


I've been implementing row polymorphism in my fork of Elm in order to support proper sum and substraction operations on unions, and it's far from trivial.

Example usecase: an Effect may fail with 2 types, but actually you have handled one/catched one so you want to remove it.

Elm-like HM systems handle fine, as you say it, row polymorphism mostly over records.

I'm not an expert in all of this, started studying this recently, so take my words with a grain of salt.


*Mitchell Wand

Not sure about your opening thesis. The vast majority of employees work for privately held businesses and from personal experience of working in such companies, “management” by OKRs and the like is common in companies who are not listed on stock markets also.

There'll inevitably be cargo culting driven by MBA curriculums and "they're making a lot of money, let's do what they did" without examining the specifics of the situation to distinguish luck from judgement.

Should be a priority. Without it, your language relies on recursion for Turing completeness.


That’s the seductive power of mocking - you get a test up and running quickly. The benefit to the initial test writer is significant.

The cost is the pain - sometimes nightmarish - for other contributors to the code base since tests depending on mocking are far more brittle.

Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.

Working on a 10+ year old code base, making a small simple safe change and then seeing a bunch of unit tests fail, my reaction is always “please let the failing tests not rely on mocks”.


> Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.

So this change doesn't allow an empty result set, something that is no longer allowed by the new implementation but was allowed previously. Isn't that the sort of breaking change you want your regression tests to catch?


I used ResultSet because the comment above mentioned it. A clearer example of what I’m talking about might be say you replace “x.size() > 0” with “!x.isEmpty()” when x is a mocked instance of class X.

If tests (authored by someone else) break, I now have to figure out whether the breakage is due to the fact that not enough behavior was mocked or whether I have inadvertently broken something. Maybe it’s actually important that code avoid using “isEmpty”? Or do I just mock the isEmpty call and hope for the best? What if the existing mocked behavior for size() is non-trivial?

Typically you’re not dealing with something as obvious.


What is the alternative? If you write a complete implementation of an interface for test purposes, can you actually be certain that your version of x.isEmpty() behaves as the actual method? If it has not been used before, can you trust that a green test is valid without manually checking it?

When I use mocking, I try to always use real objects as return values. So if I mock a repository method, like userRepository.search(...) I would return an actual list and not a mocked object. This has worked well for me. If I actually need to test the db query itself, I use a real db


The alternative to what? Using mocks?

For example, one alternative is to let my IDE implement the interface (I don’t have to “write” a complete implementation), where the default implementations throw “not yet implemented” type exceptions - which clearly indicate that the omitted behavior is not a deliberate part of the test.

Any “mocked” behavior involves writing normal debuggable idiomatic Java code - no need to learn or use a weird DSL to express the behavior of a method body. And it’s far easier to diagnose what’s going on or expected while running the test - instead of the backwards mock approach where failures are typically reported in a non-local manner (test completes and you get unexpected invocation or missing invocation error - where or what should have made the invocation?).

My test implementation can evolve naturally - it’s all normal debuggable idiomatic Java.


It doesn't have to be a breaking change -- an empty result set could still be allowed. It could simply be a perf improvement that avoids calling an expensive function with an empty result set, when it is known that the function is a no-op in this case.


If it's not a breaking change, why would a unit test fail as a result, whether or not using mocks/fakes for the code not under test? Unit tests should test the contract of a unit of code. Testing implementation details is better handled with assertions, right?

If the code being mocked changes its invariants the code under test that depends on that needs to be carefully re-examined. A failing unit test will alert one to that situation.

(I'm not being snarky, I don't understand your point and I want to.)


The problem occurs when the mock is incomplete. Suppose:

1. Initially codeUnderTest() calls a dependency's dep.getFoos() method, which returns a list of Foos. This method is expensive, even if there are no Foos to return.

2. Calling the real dep.getFoos() is awkward, so we mock it for tests.

3. Someone changes codeUnderTest() to first call dep.getNumberOfFoos(), which is always quick, and subsequently call dep.getFoos() only if the first method's return value is nonzero. This speeds up the common case in which there are no Foos to process.

4. The test breaks because dep.getNumberOfFoos() has not been mocked.

You could argue that the original test creator should have defensively also mocked dep.getNumberOfFoos() -- but this quickly becomes an argument that the complete functionality of dep should be mocked.


Jumping ahead to the comments below: obviously, I mentioned `java.sql.ResultSet` only as an example of an extremely massive interface. But if someone starts building theories based on what is left unsaid in the example for those from outside the Java world, one could, for instance, assume that such brittle tests are simply poorly written, or that they fail to mitigate Mockito's default behavior.

In my view, one of the biggest mistakes when working with Mockito is relying on answers that return default values even when a method call has not been explicitly described, treating this as some kind of "default implementation". Instead, I prefer to explicitly forbid such behavior by throwing an `AssertionError` from the default answer. Then, if we really take "one method" literally, I explicitly state that `next()` must return `false`, clearly declaring my intent that I have implemented tests based on exactly this described behavior, which in practice most often boils down to a fluent-style list of explicitly expected interactions. Recording interactions is also critically important.

How many methods does `ResultSet` have today? 150? 200? As a Mockito user, I don't care.


Drives me crazy as I travel a lot. Even if I‘m logged into Google and it’s been tracking me for years and it‘s still targeting me with obviously personalized ads but it will assume I’ve suddenly acquired temporary proficiency in French, German, Italian, Spanish, etc stubbornly ignoring the language I’ve used for my query or the only language I’ve ever - in all my interactions with Google over years - ever used. What the fuck would be so hard to just NOT use braindead geolocation (which doesn’t even work in countries like Switzerland) when you know for certain who I am. You track all this information about me and use it to generate ad revenue but refuse to use my language preference?


Patrick Boyle eventually comes to a similar conclusion in his video about global population decline - https://youtu.be/ispyUPqqL1c?si=7jUgVBkOvLHluPAR - but includes lots of graphs and other interesting factlets.

* warning for Americans: not suitable for those offended by sarcasm


I don’t see the relevance to the topic. I could preface your list with something like “The monkey wrench is not the best tool for the following situations:”. It’s kinda vacuously true in a meaningless way but without expansion adds nothing to a discussion about the relative merits of monkey wrenches versus other similar tools like pliers or vice grips.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: