CipherStash founder here: FHE isn't the only option here. Specialized searchable encryption schemes exist and are much faster than FHE. Different flavours can be combined to create a comprehensive search system which is very close to the performance of plaintext information retrieval. FHE remains an option for generalized computation but can be reserved for small datasets that have been narrowed down using fast searchable encryption.
I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.
Protect.js makes encrypting sensitive fields in Node apps like Next.js simple for devs without the need for expertise in cryptography.
It supports searchable encryption, integrates with Auth0, Okta and Clerk and works with the performance-focused ZeroKMS key management service.
As with any secure system, it depends on your threat model but the early versions of OPE are woefully insecure. The attack you're describing sounds like the one on the Boldyreva scheme from 2009. While all OPE schemes leak some information, the more recent schemes are much more secure and often quite appropriate for many use cases (arguably at least).
ORE has different security properties, particularly Lewi-Wu 2016. There are also now hybrid schemes like the EncodeORE scheme of 2020 which is both more efficient and more secure than previous OPE and practical ORE schemes.
Note that this paper is about the CLWW scheme of 2015. The Lewi-Wu Scheme of 2016 (Block Order Revealing Encryption) has entirely different security properties. Storage of the right ciphertexts (as they describe in the paper) is semantically secure.
The security properties become much more interesting when the data remains encrypted when it leaves the database. For example, an ETL process doesn't necessarily need to know what data is, just the structure. Transferring data between systems can remain encrypted the whole time.
Homomorphic is literally hundreds of thousands of times slower than operations on plaintext. A comparison of 2 64-bit integers can take around > 50ms. So even with a b-tree where maybe ~100 comparisons could occur, the query will take 5s. A linear scan over 1m records would take 13 hours!
The only problem with KMS in this model is that to get that auditability you need a data key per value/record. That means every decryption requires a request to KMS as it does not (and likely won't ever) support batched requests. We tried this for ages and the performance was terrible. < 100ms queries blew out to over 3 or 4 seconds.
Only if you need the auditability at that granular a level.
For us, each user has (for the most part) their own data key, and most of the time a user is accessing their own data. So we can decrypt the key once and then cache it for the rest of the user's session. This tells us "the user accessed their private data", so we don't get the per value auditability, but for us that was sufficient. If you want, you could even have different data keys based on sensitivity, e.g. a user's name, phone, address is encrypted with one data key but their SSN or credit info is encrypted with another.
That's true except that if that session key is lost or exfiltrated, the scope of the breach is everything that key was used to encrypt (all of the user's data in your example).
The other consideration is how to safely cache the data key? What if the cache is popped?
If you're using AES in GCM mode? Bad...like catastrophic. An attacker can reveal the key.
If you want to use constant IV for deterministic (exact) lookups. Make sure you use AES in SIV mode which is resistant to IV reuse or CBC mode with an HMAC tag. Its slower than GCM unfortunately but one of the only secure options when you want to use deterministic option.
This stuff is hard to get right and can bite you in subtle and unexpected ways.
An attacker can reveal the keystream, but not the AES key. Still catastrophic.
And AES-SIV is a lot stronger than CBC with deterministic IV, since CBC reveals if two messages start with the same sequence of 16-byte blocks, while SIV only reveals if the messages are identical.
---
There is another interesting option: Create two columns, one using randomized authenticated encryption and one using an HMAC. Then you can use the HMAC column for equality lookups.
It's a good idea. CBC without verification is vulnerable as well. An attacker can modify the IV and the value will still decrypt. It's quite easy to change a what plaintext will pop out the other side and the client will be none-the-wiser.
Depends on what kinds of data you're encrypting but if its anything to do with money or health data authenticity is a must.
So, when using CBC without verification, attacker with an access to DB won't be able to see original plaintext, but will be able to change the data?
But how can attacker control what plaintext will become, if he doesn't have a key? Wouldn't he be limited to either a random value or a value from another field?
Since IV is constant. It doesn't need to be stored in DB and can be treated like a key. So, attacker (with an access to DB) can't change IV for a server app reading from the DB.
An attacker who has write access to the database and gets feedback if a decryption was successful can still mount the standard padding oracle against CBC, because the first block acts as IV for the second block.
Key management and ability to support SQL/searching are the 2 biggest considerations.
`pg_enquo` uses Block ORE which is reasonably secure but results in very large (like 100x) ciphertext sizes. For an alternative (also written in Rust) check out https://ore.rs. It will soon support variable block sizes for smaller encrypted values.
If you want to do partial text queries or LIKE, you'll need a Searchable Symmetric Encryption (SSE) or Structured Encryption (STE) scheme. There are literally dozens of these schemes out there, each with their own tradeoffs so it can be hard to choose (Seny Kamara alone has published several: https://cs.brown.edu/people/seny/papers/).
Amazon KMS (and Google/Azure equivalents) all require a network request per encryption unless you cache/reuse keys. To put that into perspective, 1 query with 3 fields encrypted and 100 rows returned would result in 300 separate network requests.
You can use data-key caching to reuse a data key for many records to improve encryption performance. However decryption performance tends not to improve much because data-keys because they likely won't be uniformly used across your data set. Not to mention that you lose the ability to apply controls to records based on data key.
At CipherStash, we created Tandem (https://cipherstash.com/products/tandem) which uses a revised version of ORE, STE and fast key (bulk-ops) management to encrypt columns of your choosing. The core encryption is AES-256-GCM and the whole thing is written in Rust. It runs as a Docker container or standalone binary. We are working on WASM support as well as a separate Rust SDK. Most SQL queries "just work" and performance overhead is tiny (< 10ms per request).
Tandem is in preview and will be generally available at the end of November.
For some other gotchas when doing encryption in Postgres, I did a talk at Linux Conf last year (based on some ideas from Paul Grubbs et al paper of the same name): https://www.youtube.com/watch?v=JD8dtLjhmAM
I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.