Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Build a serverless app with a serverless database (fauna.com)
88 points by evanweaver on Feb 9, 2017 | hide | past | favorite | 75 comments


2018 will be so awesome when apps start to become clientless as well...


And in 2020 apps will become userless and feature entirely AI to AI interaction


SMS chat bot! Actually, a growing category... :)


Y(SMS chat bot)


I still don't get how you can store something without putting it anywhere. It's just "in transient" like electricity in a superconductor closed coil? hahaha

Not doubting it I just don't understand. Unless you're storing through "cache" client-side or direct RAM I don't know.


Serverless just means not hosting your own infrastructure + boxes. Amazon Lambda is a good example where you run arbitrary script on their infrastructure on demand, without having always-available servers.


I find the term "serverless" quite misleading. Essentially, all the code in the world has to run on a machine so there is nothing really "server-less" about it.

It's probably more accurate to say something like "devops-free" since management of servers is hived off to a third party. Sounds less buzz-worthy, so maybe someone can come up with something better.


But there is still a database? (I realize this wasn't mentioned in the thread title). yeah "serverless" seems kind of mis-leading. I'm wondering though what "script/language" you'd write this. How to store... should probably learn how to read first.

Thanks for the clarification.


In case you are wondering what fauna is in general, it's an object-relational, temporal, geographically distributed, strongly consistent, multi-tenant, QoS-managed operational database. It's implemented on the JVM and queried via type-safe embedded DSLs like LINQ.


Is there any way for me to contribute money or other donations to your project? Good on you for making your dreams a reality.


You can use the database, and eventually we will send you a bill for your usage. :-)


It worked for IBM. It then worked for Amazon. It worked for AT&T GoPhones. It can work for you, too. :)


"Whooptydoo, but what does it all mean, Basil?"


It's fully BuzzWord compliant then! (Sound like this is naked marketing post)


He's the founder, so I guess it is a marketing post.


Somebody else asked this, but the answer got distracted onto (reasonably so) the CAP Theorem.

What is the difference between Fauna and DynamoDB? Especially since the article swaps them out (and explains the API differences).

DynamoDB is going to be having a replica you can read from within physical milliseconds of your lambda function (serverless is such a bad name, makes me think of P2P, anyways...) while it seems like Fauna is gonna have to make network calls out to your service...

Which when you pay per time with lambda, and you want lambda functions to be fast anyways, I don't see the point of Fauna? Note: I'm not saying Fauna is bad, it seems like a cool idea, but I'm not understanding how it is a superior alternative.

So lets say you don't want to pay for DynamoDB, it still seems like you'd be better off running something like a pure NodeJS database like Parse's open source server or https://github.com/amark/gun , either inside the lambda function directly or connecting to it (since it'll be on a nearby machine in AWS)?


FaunaDB is hosted in several AWS regions around the world, so the latency is similar to DynamoDB if you're colocated. We will expand to more AWS regions and other cloud providers soon.

The biggest operational difference between FaunaDB and DynamoDB, aside from being globally distributed, is that you don't have to pre-provision capacity in FaunaDB. DynamoDB requires you to provision capacity per table and you have to pay for whatever you don't use. If you go over the provisioned capacity your app stops working. FaunaDB is delivered like a utility; you just pay as you go.

Also FaunaDB supports joins, transactions, unique indexes, views, etc., and you can install it on-premises if you want.


Thanks for the answer!


What's the difference between "serverless" and "cloud"?


Cloud basically just means running software on other peoples' computers in a data center somewhere. Serverless is a subset of cloud. It's a way of building and deploying cloud apps.

Serverless implies a very high level of abstraction when interacting with cloud infrastructure. Traditionally app developers have consumed cloud infrastructure on the level of individual boxes running operating systems. Obviously these don't usually correspond to actual physical machines, but the OS box is the unit of abstraction presented to people running building and deploying cloud apps. Serverless apps are typically based on the "function" (one invocation of some small bit of logic) as the unit of consumption for cloud infrastructure - e.g AWS Lambda. They also typically embrace high-level abstractions for managing data - e.g. building against DynamoDB as opposed to your own Cassandra cluster running on EC2.

The term "serverless" is annoying as heck. I mean, sure, you're not fiddling with nginx configs, but you're even more tightly dependent on running in a cloud environment. It really should be "servermore".


It's analogous to "functional" and "object-oriented". They're both popular buzzwords. One has been beaten to death and sometimes implemented poorly enough to gain a bad rap. The other is still new enough to serve as a vessel for software people to pour in all their vague hopes and unrealistic dreams.

At some point, the new hotness will be the old tiredness, and the cycle will repeat.


I'm honestly sure which description refers to functional and which refers to OOP. They fit equally well.


Serverless apps run in reaction to an event such as a web request or a file upload. In the serverless model your app has functions that run in response to these events. The classic cloud model is you pay for a server or container that is constantly running.

Its kinda equivalent to a timeshare condo vs renting a condo. Your functions only run when they are called.


Here:

>When I say serverless, I’m referring to the function-as-a-service pattern. A serverless system must scale dynamically per request, and not require any capacity planning or provisioning. For instance, you can connect to FaunaDB serverless cloud in moments, and scale seamlessly from idea to runaway hit.


> serverless system must [...] not require any capacity planning or provisioning

So, not AWS lambda then. With their concurrency limits, one-lambda-function-per-kinesis shard architecture, gateway request per second limitations, API count limitations, payload limits...


What's the difference between this and AWS' DynamoDB, or Aurora if you prefer RDBMS? A HTTP endpoint as the DB API? Seems like a lot of extra overhead, unless you're hosting a "client-only" webapp (which would previously have used Firebase, IIRC).

Also, this confuses me:

> FaunaDB can tolerate the loss of a minority of physical datacenters in a cluster without interruption. According to the CAP theorem, FaunaDB is a CP system.

CP means that consistency is favored over availability, yet "without interruption" tells me they favor availability over consistency during a partition.


If a partition leaves a quorum in contact with each other, why would it cause an interruption? CP means nodes not within a quorum become unavailable, and if a quorum no longer exists the whole system is unavailable. AP means minorities stay available at the risk of inconsistency.


> If a partition leaves a quorum in contact with each other, why would it cause an interruption?

You can have a split that still has a group of machines with quorum: a 3/2 split would leave three nodes with quorum, and two without.

Clients which attach to the non-quorum machines would lose the ability to read or write if it's CP, yet the clients connected to the quorum machines would retain the ability to read and write. So it would be a partial outage, until some way is found to identify quorum members and route clients back to that quorum (making the assumption that clients could talk to any node in a partition).


Looks like the fact it automatically scales horizontally: with Aurora, you can only scale vertically (in terms of memory and compute), and DynamoDB requires you to manage throughput yourself (not a big deal; there are services to automatically raise and lower it based on usage).

Without interruption -for a minority of nodes-. Meaning it's using a quorum to achieve consensus. In the event of partition (which looks the same to the remaining nodes as a 'loss'), the majority side will still allow reads/writes. Meaning if you can't talk to the majority, you can't read/write; hence, not AP. If it maintains distributed consistency provided there's a quorum (Raft, Paxos, etc), it's CP.


> Looks like the fact it automatically scales horizontally

I'd be curious then what the cost for doing a lookup of data not on your current node would look like. Do you re-connect to a different node which does have the data, or is it transparently piped back to your current node on request? Is that request broadcast, or is there some form of index maintained on each node of who has what data, how is the cross-talk structured... I'm a bit of a DB nerd, so the answers to these interest me.

> Meaning it's using a quorum to achieve consensus.

I missed the sly usage of "minority" there. I was expecting a quorum based architecture based on the rest of the documentation.

It seems dishonest to imply that there is "no interruption" at all on partition, since that's obviously not the case.


There is no interruption to the majority of the datacenters that remain connected. We can clarify on the site.

You can actually still do temporally consistent reads from the disconnected minority, but obviously they won't have the latest updates.


The operations model is similar to most cloud databases, with metered usage. You just configure your app to use the database, and the scaling is handled for you.

FaunaDB has a strong consistency, a relational data model and rich queries. This makes it more like a traditional SQL operational database, except it scales.


>> This makes it more like a traditional SQL operational database, except it scales.

Seriously. Startups that promise that are ~dime a dozen.

Can we see TPC-C or whatever numbers?


We are focussed on winning customers, but we've been happy to see we can turn heads at large scale shops running real installations of the usual suspects. We'll publish something like what you are asking for soon.

In these evaluations we are running on production data so we can't share them directly.

What do you think about something like this for generating a reasonable data set?

http://ldbcouncil.org/blog/datagen-realistic-social-network-...


That is great, but people like to see how your solution compares to other solutions!

Hence, benchmarks.

So, if you do TPC-C or TPC-D or whatever, and compare well to others, you are doing great!

But if you avoid benchmarks, people think that you are not doing well. So they will not buy.

Just my 5c.


If you want to skip directly to running code, these instructions (linked from the article) should get you to hello world. https://github.com/fauna/serverless-crud#installation


I really wonder what possesses people to use pullquotes

    WHAT POSSESSES PEOPLE
    TO   USE   PULLQUOTES
Its like they don't trust you to read a couple of sentences or something


A couple of reasons:

1. They don't trust people to read everything. A lot of readers drop off before the end of an article just because their attention flits away. Pull quotes are a way of saying, "Here's something coming up that I think is interesting. If you are interested, you should keep reading."

2. A lot of people have trouble with long runs of samey text. Some see it as boring, others as imposing, others as hard to navigate, but for whatever reason, long runs of text are simply hard to read for a lot of people. So pull quotes are a way to break up the text without resorting to vaguely relevant cat pictures.


> "Here's something coming up that I think is interesting. If you are interested, you should keep reading."

But nearly no one ever uses it to refer to what is coming up, its almost always what has just happened 1 sentence ago.

> A lot of people have trouble with long runs of samey text.

nothing is more samey than repeating the same sentence!

pull quotes like how this article has,punishes the user for reading the article word for word. There are other methods that have the pull quotes outside of the flow!

And if it is a really important sentence, then throw some slight yellow background on the text or something, like a highlighter!


> But nearly no one ever uses it to refer to what is coming up, its almost always what has just happened 1 sentence ago

That's true if you read them inline, but pull quotes are generally presented in a large font so that you can see them without having actually read the accompanying text yet.

> pull quotes like how this article has, punishes the user for reading the article word for word, There are other methods that have the pull quotes outside of the flow, which I personally would prefer!

I agree with that. The pull-quotes on this site are poorly designed and really hurt the flow of the article.


Well, on that side, the pull quotes don't arrive earlier than a 'proper' reader should meet them, and provide a quick summary for the scrollers amongst us.


3. It helps when speedreading/skimming, for me at least. Pullquotes done right can let me work out the gist of an article, such that I can subsequently decide if it's worth reading further.


The way I prefer to see this solved is good section headings. They fulfill exactly the same purpose, except they don't have the annoying disadvantages GP mentioned.


Well, I have to agree with OP. If anything, they turn me AWAY from an article that I otherwise may have been interested in.


I personally prefer that over websites putting the pullquotes somewhere far away from the original quote. That's annoying as all hell.

I mean, if my attention sagged and I'm being brought back into engaging with the article via a pullquote, I at the very least expect the quote to at least pull me toward the relevant context.

Instead, way too many sites stick the pullquotes way after or way before the actual paragraph from which the quote was extracted, making it incredibly difficult to establish exactly what the context might be for that quote.

    THAT'S ANNOYING AS
    ALL HELL
See what I mean?


Not everyone wants to read everything. were not all as interested or free as you may be


That's fine, just read:

    A serverless system must scale dynamically per request.
    FaunaDB is a globally distributed database that requires
    no provisioning — you only pay for what you use.
Well... I guess it does show that the entire article is just an advert? So maybe useful in that way!


I understood that Lamda just freezes the code between calls. So any connection to a database just continues when the next call comes in. Only problem might be the server time-out for the connection if the time between calls is longer.

I never tried (yet) but if I need something like Fauna then my assumption was wrong ?


This is a prime example of where "serverless" means "someone else's server".


When has it ever meant anything else? P2P?


The first time I read the word "serverless" I immediately related it to P2P


Is this DB ACID compliant? This is a major omission from the article, as I was considering this as a replacemnt for PostGres SQL As-A-Service offerings.


Yes. We offer global acid transactions using the Calvin protocol. http://cs.yale.edu/homes/thomson/publications/calvin-sigmod1...


Sounds awfully like a server to me.


I was initially overly excited by AWS Lambda + API Gateway but now looking at the costs it's cheaper and less overhead to just run a highly available boxes.

For large organizations, I can see the benefit of moving to serverless particularly doing away with server ops for more slower and less frequent tasks..

but for fast response and cost effectiveness, unless AWS Lambda dramatically reduces costs to match a $5 / month digitalocean instance that will respond instantly and can take quite a beating for lighter requests, I'd be more wary-AWS bills can rack up very fast.


Server ops applies regardless of size of organization; as an individual or small group you still have to perform that operation, and that takes time, that could instead go into a product.

It's beneficial because it's easy to spin up code and extremely cheap until you get load (so great for prototypes or MVPs), and it scales predictably. Yes, it can get more costly at a certain point than just running your own solution, but that point is less obvious than you think, and likely later (once you include the sysops tasks you need to take care of) than you think, and at that point you hopefully have enough of a revenue stream to be able to determine whether it makes more sense to move to servers, or to spend that time/money building new features.


lostcolony I think you are over stating how hard to is to run ops on your own server. The separation between engineering and ops has gotten a little out of control. Engineers should know how to do basic ops. So the cost savings you say you are getting by letting your engineers focus ONLY on the product and not worry about those pesky little details called ram and cpu you really shoot yourself in the foot. It's like putting blindfolds on your engineers and saying code away as if the world will always be this dark. The skills your engineers learn in deploying their own code to 1 digital ocean box is priceless. Well it's $5 a month to digital ocean. But it's not just the savings in bills to lamba, it's the knowledge in engineering craft they learn that is priceless. Because the world runs on servers. The world does not run on magical things that are not servers.


I think lostcolony didnt explicitly mention it, but it's not the concern about ops lifecycle of one $5 box. I agree with you, that's easy. But "what if" it suddenly NEEDS to scale more than a $5 box? Now your engineers need to rush to put together an HA solution, load balancing, etc. is it worth planning & building that for each microservice, or is it better to just deploy code to a scalable platform?


I meant more than that. Even if your application is just going to remain on one box, how many applications are you going to have? Are you going to have a dev, stage, and prod box? When an underlying library is updated, how do you manage that? Are you certain it will not break anything, and so make it automatic, or do you do it manually? When a box needs to be replaced, you have to handle that. Etc. It's more than just "buy a box and done".

Yes, no one of these is hard to do, and they should be things any engineer is familiar with. But -why are you spending your time on it-?

If your project is small enough, serverless allows you to spend all of your time on your actual code, not ops tasks, and -know- it will be trivial to scale, for the same amount of money.

If the project is large enough, the same thing applies; the ops tasks required for multiple boxes get more complicated, and serverless keeps you just focused on the code, with it scaling trivially.

In short, bear in mind the opportunity cost. If someone feels the ops work + cost of hardware < going serverless, fine. That's their decision. But to be dismissive of those who find it's a better value to go serverless, because they can iterate faster, because it's no more money at first, and only gets more expensive at scale (when they hopefully are making money, -and- have saved themselves the work of building an HA solution, as well as handling any unexpected shared state), seems misplaced.


"and -know- it will be trivial to scale" that's my point. You won't know that. If you spend all your time in fantasy land where ram and cpu are infinite you start to loose touch with reality in how you code. You have to code defensively against ram and cpu use.


When going serverless, not really. That's...kind of the point. Or are you saying that because you stop having to worry about it, if ever you go back to servers you're going to make scaling mistakes?


Yes. Guaranteed. Like the movie rush about race car driver Nikki launder. He tuned that race car engine down to every last detail to get performance. With server less their will always be big big low hanging fruit optimizations.


Lambda cold start time for a .NET app that made a SQL connection and then did a simple count command was multiple seconds. Not useful for production.


That totally depends on what you are doing. We use Lambda to generate thumbnails of documents that are uploaded to one of our applications. I can throw 600-800 pages at it and because it calls a Lambda function per page, the whole thing finishes in less than 10 seconds. No need to scale instances and no need to worry about running out of RAM on our servers. The cold start also gets less after the initial run.


Yeah. I could see it for batch processing. It's just not suitable for real time / API work that I was hoping to use it for.


Yes, it's not necessarily going to work for things like that - the start up time alone would make it useless.


I think this is the jargon-y-est comment I've ever read.


Doesn't lambda have hundreds of thousands of requests for free? Also, the convenience of lambda counts for a lot: no set up or maintenance.


No set up except for the API gateway interface and the lambda function itself, as well as any IAM roles or policies you want to put in place.

No maintenance except for watching for throttling by AWS, watching your billing to ensure it doesn't go out of control, watching for API Gateway errors.


The latter can be setup as part of Cloudwatch, though.


Please forgive me the following:

Hey, more setup!

Theoretically, you could also do the entire setup via Cloud Formation; but you can also do the same with EC2 instances and ECS.

What would I do instead? Set up a 1-n member autoscaling group, with rules to scale on load. Set up an ECS service which auto scales on load. Set up an ELB attached to the ECS service. Not as fancy as Lambda and AWS Gateway, but it will probably scale better, at a lower peak cost (at the expense of an up front cost of one server).


Are you talking about AWS Lambda, or some other Lambda that is convenient? I have never seen "AWS Lambda" and "convenient" in the same sentence. Even the people who like AWS Lambda say it's complicated.


Meh - led by a couple of ex-Twitter infrastructure guys. No thanks.


Twitter is possibly one of the hardest sites to scale out there. Sure they had their share of the fail whale but can you blame them?


You can read a little bit about our Twitter experience here: https://fauna.com/blog/welcome-to-the-jungle Twitter still uses the social graph and timeline databases we built years and years ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: