You could reuse df (as the method presumably change df), in addition:
- you have more room to comment what the intention of the code/call is
- you have room to handle / check for errors
For example, the "sort_values" and "groupby" in you example are obvious most readers. But "transform("cumsum") is probably obvious to you but I don't know the intention of the code.
By reassigning to df it makes it clear that the returned value from each of the functions is a `df` (some kind of query builder I think). Actually rewriting you r code I discovered your groupby did a groupby and then selected a result column I think.
So we would get:
// sort result on dependency date
df = df.sort_values('dep_date')
// ... check if sort_values worked (i.e `dep_date` is valid column)
// group result by name and select duration column
durations = df.groupby('name')['duration']
// compute columns cumulative sum of durations
sum = durations.transform('cumsum')
I agree its much more verbose, so this won't work if you (just) want conciseness.
If you reuse the variable, then you are not getting any of the alleged benefits of using variables. The same name tells little except it is perhaps the same data type and in Elixir you would also get this information from the module you are invoking:
text
|> String.split(“,”)
|> Enum.join(“ “)
You could also equally add comments between the lines in the dot example:
df.sort_values(“dep_date”)
# group and select
.groupby(…)[…]
Although most of the comments above are discardable (IMO) because they are restating the code.
My point is: sometimes I will break out into variables to get some of the benefits you mention. But forcing all intermediate steps to assign to variables is as harmful as using “|>” or “.” exclusively and forgetting about variables altogether. If you need to add error handling, code comments, etc, you can break out of the pipeline as needed.
I think Jose's example provides the best of both worlds, conciseness and readability.
> you have more room to comment what the intention of the code/call is
There's nothing stopping each pipe expression from having a comment of its own if you lean towards literate programming, or you feel that the pipe function + arguments begs further explanation to unfamiliar developers.
> you have room to handle / check for errors
You can pipe your results into a validation function. Ecto, the go-to database mapper, has exactly this pattern. Failing fast is idiomatic Erlang/Elixir (rather than let the process live on with corrupt state) which means if the validation fails, it ought to raise an exception so the external caller can fix their call/request, or if it's a bug, the developer can be alerted to fix the code.
> You could reuse df (as the method presumably change df),
Then what's the point in assigning to intermittent variables (as your criticism about the pipe was)? You don't get any added clarity about the intermittent intentions that way...
- you have more room to comment what the intention of the code/call is
- you have room to handle / check for errors
For example, the "sort_values" and "groupby" in you example are obvious most readers. But "transform("cumsum") is probably obvious to you but I don't know the intention of the code.
By reassigning to df it makes it clear that the returned value from each of the functions is a `df` (some kind of query builder I think). Actually rewriting you r code I discovered your groupby did a groupby and then selected a result column I think.
So we would get:
I agree its much more verbose, so this won't work if you (just) want conciseness.