The article's conclusion -- that Lambda is cheaper than EC2 instances for this use case -- is completely wrong. The author only counted the per-request overhead, and neglected to add the actual cost of the GB-hours consumed. If each container uses 512MB of memory, then keeping one request running at a time for an entire month costs about $22. For comparison, a t2.nano instance with the same amount of memory costs $4/month.
Lambda is a value-added service on top of EC2. It only makes financial sense to use it when you don't want something running constantly, or otherwise have a way to take advantage of the extremely fine granularity in billing. (Or if you're willing to pay a premium to have Amazon manage your process lifecycles for you.)
$4/mo doesn't include the cost of monitoring and maintenance that come with running your own instance. When you can't amortize those costs across a large fleet of instances, Lambda is often much cheaper when looking at the total cost. Also, to get anywhere near the same reliability expectations, you'd need at least 2 instances and an ELB. Granted that will be able to handle a lot more traffic, but it's still not fair to compare Lambda to a single instance.
You would also need to include a messaging queue infrastructure as lambda offers a dead letter queue. So that will require a few more RabbitMQ ec2 boxes in cluster mode.
Great article, and an interesting use of Lambda, thanks for sharing!
To answer your final question: I wrote a spot instance automation tool, you can check it out at autospotting.org, so I would give spot instances a try. The latest developments from AWS on the spot market are real game changers, I think most of the workloads can now safely run on spot, my AutoSpotting tool makes it a breeze to migrate from on-demand AutoScaling groups while keeping them a bit more reliable than the native AutoScaling integration for spot.
As of a few months ago the pricing is much more stable than before, I've rarely seen terminations even over the maximum three months of history for instances that used to go bust multiple times a day. You also now pay them on a per-second basis, and you can hibernate the last one to keep the state of the group while everything is down.
So my approach for this would be to have an AutoScaling group of the smallest spot instances that can run your app, scale them to N nodes right before your experiment, then when you're done scale down to a single one, which you use as data seed next time, which you detach and hibernate with API calls.
Next time you re-attach the seed to the empty group, and scale out to N once again and run your test. So you only pay for the length of your test on a per-second basis.
You can also keep the seed as an on demand node outside of the spot group and have it run from the free tier if you still have some time left, or just hibernate it as well.
Keith Winstein (Stanford) et al's gg [1] is also fun. Sort of `make -j1000` for 10 cents. Create a deterministic-compilation model of a C build task, upload the source files, briefly run a lot of lambdas, download the resulting executable. (Though it's more general than that.)
For folks long despairing that our programming environments have been stuck in a rut for decades, we're about to be hit by both the opportunity to reimagine our compilation tooling, and the need to rewrite the world again (as for phones) for VR/AR. If only programming language and type systems research hadn't been underfunded for decades, we'd be golden.
I've found out-of-the-box distributed erlang difficult to run in environments with a lot of instance churn (e.g. containerized deployments on Kubernetes), so much so that I generally opt to not connect my nodes for erlang message passing. Does anyone here have experience running Lasp in Kubernetes? Is Lasp effective in monitoring and adjusting to new or dead nodes?
Lambda is a value-added service on top of EC2. It only makes financial sense to use it when you don't want something running constantly, or otherwise have a way to take advantage of the extremely fine granularity in billing. (Or if you're willing to pay a premium to have Amazon manage your process lifecycles for you.)