I still don't get how you can store something without putting it anywhere. It's just "in transient" like electricity in a superconductor closed coil? hahaha
Not doubting it I just don't understand. Unless you're storing through "cache" client-side or direct RAM I don't know.
Serverless just means not hosting your own infrastructure + boxes. Amazon Lambda is a good example where you run arbitrary script on their infrastructure on demand, without having always-available servers.
I find the term "serverless" quite misleading. Essentially, all the code in the world has to run on a machine so there is nothing really "server-less" about it.
It's probably more accurate to say something like "devops-free" since management of servers is hived off to a third party. Sounds less buzz-worthy, so maybe someone can come up with something better.
But there is still a database? (I realize this wasn't mentioned in the thread title). yeah "serverless" seems kind of mis-leading. I'm wondering though what "script/language" you'd write this. How to store... should probably learn how to read first.
In case you are wondering what fauna is in general, it's an object-relational, temporal, geographically distributed, strongly consistent, multi-tenant, QoS-managed operational database. It's implemented on the JVM and queried via type-safe embedded DSLs like LINQ.
Somebody else asked this, but the answer got distracted onto (reasonably so) the CAP Theorem.
What is the difference between Fauna and DynamoDB? Especially since the article swaps them out (and explains the API differences).
DynamoDB is going to be having a replica you can read from within physical milliseconds of your lambda function (serverless is such a bad name, makes me think of P2P, anyways...) while it seems like Fauna is gonna have to make network calls out to your service...
Which when you pay per time with lambda, and you want lambda functions to be fast anyways, I don't see the point of Fauna? Note: I'm not saying Fauna is bad, it seems like a cool idea, but I'm not understanding how it is a superior alternative.
So lets say you don't want to pay for DynamoDB, it still seems like you'd be better off running something like a pure NodeJS database like Parse's open source server or https://github.com/amark/gun , either inside the lambda function directly or connecting to it (since it'll be on a nearby machine in AWS)?
FaunaDB is hosted in several AWS regions around the world, so the latency is similar to DynamoDB if you're colocated. We will expand to more AWS regions and other cloud providers soon.
The biggest operational difference between FaunaDB and DynamoDB, aside from being globally distributed, is that you don't have to pre-provision capacity in FaunaDB. DynamoDB requires you to provision capacity per table and you have to pay for whatever you don't use. If you go over the provisioned capacity your app stops working. FaunaDB is delivered like a utility; you just pay as you go.
Also FaunaDB supports joins, transactions, unique indexes, views, etc., and you can install it on-premises if you want.
Cloud basically just means running software on other peoples' computers in a data center somewhere. Serverless is a subset of cloud. It's a way of building and deploying cloud apps.
Serverless implies a very high level of abstraction when interacting with cloud infrastructure. Traditionally app developers have consumed cloud infrastructure on the level of individual boxes running operating systems. Obviously these don't usually correspond to actual physical machines, but the OS box is the unit of abstraction presented to people running building and deploying cloud apps. Serverless apps are typically based on the "function" (one invocation of some small bit of logic) as the unit of consumption for cloud infrastructure - e.g AWS Lambda. They also typically embrace high-level abstractions for managing data - e.g. building against DynamoDB as opposed to your own Cassandra cluster running on EC2.
The term "serverless" is annoying as heck. I mean, sure, you're not fiddling with nginx configs, but you're even more tightly dependent on running in a cloud environment. It really should be "servermore".
It's analogous to "functional" and "object-oriented". They're both popular buzzwords. One has been beaten to death and sometimes implemented poorly enough to gain a bad rap. The other is still new enough to serve as a vessel for software people to pour in all their vague hopes and unrealistic dreams.
At some point, the new hotness will be the old tiredness, and the cycle will repeat.
Serverless apps run in reaction to an event such as a web request or a file upload. In the serverless model your app has functions that run in response to these events. The classic cloud model is you pay for a server or container that is constantly running.
Its kinda equivalent to a timeshare condo vs renting a condo. Your functions only run when they are called.
>When I say serverless, I’m referring to the function-as-a-service pattern. A serverless system must scale dynamically per request, and not require any capacity planning or provisioning. For instance, you can connect to FaunaDB serverless cloud in moments, and scale seamlessly from idea to runaway hit.
> serverless system must [...] not require any capacity planning or provisioning
So, not AWS lambda then. With their concurrency limits, one-lambda-function-per-kinesis shard architecture, gateway request per second limitations, API count limitations, payload limits...
What's the difference between this and AWS' DynamoDB, or Aurora if you prefer RDBMS? A HTTP endpoint as the DB API? Seems like a lot of extra overhead, unless you're hosting a "client-only" webapp (which would previously have used Firebase, IIRC).
Also, this confuses me:
> FaunaDB can tolerate the loss of a minority of physical datacenters in a cluster without interruption. According to the CAP theorem, FaunaDB is a CP system.
CP means that consistency is favored over availability, yet "without interruption" tells me they favor availability over consistency during a partition.
If a partition leaves a quorum in contact with each other, why would it cause an interruption? CP means nodes not within a quorum become unavailable, and if a quorum no longer exists the whole system is unavailable. AP means minorities stay available at the risk of inconsistency.
> If a partition leaves a quorum in contact with each other, why would it cause an interruption?
You can have a split that still has a group of machines with quorum: a 3/2 split would leave three nodes with quorum, and two without.
Clients which attach to the non-quorum machines would lose the ability to read or write if it's CP, yet the clients connected to the quorum machines would retain the ability to read and write. So it would be a partial outage, until some way is found to identify quorum members and route clients back to that quorum (making the assumption that clients could talk to any node in a partition).
Looks like the fact it automatically scales horizontally: with Aurora, you can only scale vertically (in terms of memory and compute), and DynamoDB requires you to manage throughput yourself (not a big deal; there are services to automatically raise and lower it based on usage).
Without interruption -for a minority of nodes-. Meaning it's using a quorum to achieve consensus. In the event of partition (which looks the same to the remaining nodes as a 'loss'), the majority side will still allow reads/writes. Meaning if you can't talk to the majority, you can't read/write; hence, not AP. If it maintains distributed consistency provided there's a quorum (Raft, Paxos, etc), it's CP.
> Looks like the fact it automatically scales horizontally
I'd be curious then what the cost for doing a lookup of data not on your current node would look like. Do you re-connect to a different node which does have the data, or is it transparently piped back to your current node on request? Is that request broadcast, or is there some form of index maintained on each node of who has what data, how is the cross-talk structured... I'm a bit of a DB nerd, so the answers to these interest me.
> Meaning it's using a quorum to achieve consensus.
I missed the sly usage of "minority" there. I was expecting a quorum based architecture based on the rest of the documentation.
It seems dishonest to imply that there is "no interruption" at all on partition, since that's obviously not the case.
The operations model is similar to most cloud databases, with metered usage. You just configure your app to use the database, and the scaling is handled for you.
FaunaDB has a strong consistency, a relational data model and rich queries. This makes it more like a traditional SQL operational database, except it scales.
We are focussed on winning customers, but we've been happy to see we can turn heads at large scale shops running real installations of the usual suspects. We'll publish something like what you are asking for soon.
In these evaluations we are running on production data so we can't share them directly.
What do you think about something like this for generating a reasonable data set?
1. They don't trust people to read everything. A lot of readers drop off before the end of an article just because their attention flits away. Pull quotes are a way of saying, "Here's something coming up that I think is interesting. If you are interested, you should keep reading."
2. A lot of people have trouble with long runs of samey text. Some see it as boring, others as imposing, others as hard to navigate, but for whatever reason, long runs of text are simply hard to read for a lot of people. So pull quotes are a way to break up the text without resorting to vaguely relevant cat pictures.
> "Here's something coming up that I think is interesting. If you are interested, you should keep reading."
But nearly no one ever uses it to refer to what is coming up, its almost always what has just happened 1 sentence ago.
> A lot of people have trouble with long runs of samey text.
nothing is more samey than repeating the same sentence!
pull quotes like how this article has,punishes the user for reading the article word for word. There are other methods that have the pull quotes outside of the flow!
And if it is a really important sentence, then throw some slight yellow background on the text or something, like a highlighter!
> But nearly no one ever uses it to refer to what is coming up, its almost always what has just happened 1 sentence ago
That's true if you read them inline, but pull quotes are generally presented in a large font so that you can see them without having actually read the accompanying text yet.
> pull quotes like how this article has, punishes the user for reading the article word for word, There are other methods that have the pull quotes outside of the flow, which I personally would prefer!
I agree with that. The pull-quotes on this site are poorly designed and really hurt the flow of the article.
Well, on that side, the pull quotes don't arrive earlier than a 'proper' reader should meet them, and provide a quick summary for the scrollers amongst us.
3. It helps when speedreading/skimming, for me at least. Pullquotes done right can let me work out the gist of an article, such that I can subsequently decide if it's worth reading further.
The way I prefer to see this solved is good section headings. They fulfill exactly the same purpose, except they don't have the annoying disadvantages GP mentioned.
I personally prefer that over websites putting the pullquotes somewhere far away from the original quote. That's annoying as all hell.
I mean, if my attention sagged and I'm being brought back into engaging with the article via a pullquote, I at the very least expect the quote to at least pull me toward the relevant context.
Instead, way too many sites stick the pullquotes way after or way before the actual paragraph from which the quote was extracted, making it incredibly difficult to establish exactly what the context might be for that quote.
A serverless system must scale dynamically per request.
FaunaDB is a globally distributed database that requires
no provisioning — you only pay for what you use.
Well... I guess it does show that the entire article is just an advert? So maybe useful in that way!
I understood that Lamda just freezes the code between calls. So any connection to a database just continues when the next call comes in. Only problem might be the server time-out for the connection if the time between calls is longer.
I never tried (yet) but if I need something like Fauna then my assumption was wrong ?
Is this DB ACID compliant? This is a major omission from the article, as I was considering this as a replacemnt for PostGres SQL As-A-Service offerings.
I was initially overly excited by AWS Lambda + API Gateway but now looking at the costs it's cheaper and less overhead to just run a highly available boxes.
For large organizations, I can see the benefit of moving to serverless particularly doing away with server ops for more slower and less frequent tasks..
but for fast response and cost effectiveness, unless AWS Lambda dramatically reduces costs to match a $5 / month digitalocean instance that will respond instantly and can take quite a beating for lighter requests, I'd be more wary-AWS bills can rack up very fast.
Server ops applies regardless of size of organization; as an individual or small group you still have to perform that operation, and that takes time, that could instead go into a product.
It's beneficial because it's easy to spin up code and extremely cheap until you get load (so great for prototypes or MVPs), and it scales predictably. Yes, it can get more costly at a certain point than just running your own solution, but that point is less obvious than you think, and likely later (once you include the sysops tasks you need to take care of) than you think, and at that point you hopefully have enough of a revenue stream to be able to determine whether it makes more sense to move to servers, or to spend that time/money building new features.
lostcolony I think you are over stating how hard to is to run ops on your own server. The separation between engineering and ops has gotten a little out of control. Engineers should know how to do basic ops. So the cost savings you say you are getting by letting your engineers focus ONLY on the product and not worry about those pesky little details called ram and cpu you really shoot yourself in the foot. It's like putting blindfolds on your engineers and saying code away as if the world will always be this dark. The skills your engineers learn in deploying their own code to 1 digital ocean box is priceless. Well it's $5 a month to digital ocean. But it's not just the savings in bills to lamba, it's the knowledge in engineering craft they learn that is priceless. Because the world runs on servers. The world does not run on magical things that are not servers.
I think lostcolony didnt explicitly mention it, but it's not the concern about ops lifecycle of one $5 box. I agree with you, that's easy. But "what if" it suddenly NEEDS to scale more than a $5 box? Now your engineers need to rush to put together an HA solution, load balancing, etc. is it worth planning & building that for each microservice, or is it better to just deploy code to a scalable platform?
I meant more than that. Even if your application is just going to remain on one box, how many applications are you going to have? Are you going to have a dev, stage, and prod box? When an underlying library is updated, how do you manage that? Are you certain it will not break anything, and so make it automatic, or do you do it manually? When a box needs to be replaced, you have to handle that. Etc. It's more than just "buy a box and done".
Yes, no one of these is hard to do, and they should be things any engineer is familiar with. But -why are you spending your time on it-?
If your project is small enough, serverless allows you to spend all of your time on your actual code, not ops tasks, and -know- it will be trivial to scale, for the same amount of money.
If the project is large enough, the same thing applies; the ops tasks required for multiple boxes get more complicated, and serverless keeps you just focused on the code, with it scaling trivially.
In short, bear in mind the opportunity cost. If someone feels the ops work + cost of hardware < going serverless, fine. That's their decision. But to be dismissive of those who find it's a better value to go serverless, because they can iterate faster, because it's no more money at first, and only gets more expensive at scale (when they hopefully are making money, -and- have saved themselves the work of building an HA solution, as well as handling any unexpected shared state), seems misplaced.
"and -know- it will be trivial to scale" that's my point. You won't know that. If you spend all your time in fantasy land where ram and cpu are infinite you start to loose touch with reality in how you code. You have to code defensively against ram and cpu use.
When going serverless, not really. That's...kind of the point. Or are you saying that because you stop having to worry about it, if ever you go back to servers you're going to make scaling mistakes?
Yes. Guaranteed. Like the movie rush about race car driver Nikki launder. He tuned that race car engine down to every last detail to get performance. With server less their will always be big big low hanging fruit optimizations.
That totally depends on what you are doing. We use Lambda to generate thumbnails of documents that are uploaded to one of our applications. I can throw 600-800 pages at it and because it calls a Lambda function per page, the whole thing finishes in less than 10 seconds. No need to scale instances and no need to worry about running out of RAM on our servers. The cold start also gets less after the initial run.
No set up except for the API gateway interface and the lambda function itself, as well as any IAM roles or policies you want to put in place.
No maintenance except for watching for throttling by AWS, watching your billing to ensure it doesn't go out of control, watching for API Gateway errors.
Theoretically, you could also do the entire setup via Cloud Formation; but you can also do the same with EC2 instances and ECS.
What would I do instead? Set up a 1-n member autoscaling group, with rules to scale on load. Set up an ECS service which auto scales on load. Set up an ELB attached to the ECS service. Not as fancy as Lambda and AWS Gateway, but it will probably scale better, at a lower peak cost (at the expense of an up front cost of one server).
Are you talking about AWS Lambda, or some other Lambda that is convenient? I have never seen "AWS Lambda" and "convenient" in the same sentence. Even the people who like AWS Lambda say it's complicated.
You can read a little bit about our Twitter experience here: https://fauna.com/blog/welcome-to-the-jungle Twitter still uses the social graph and timeline databases we built years and years ago.