We (and many others) have a team building fun things into our data analysis tool here.
For what will soon become a 10% time prompt engineering role for a much easier kind of security investigations experience, we are hiring cleared security folks in Australia (SIEM / python SE) and a cleared cybersecurity data scientist in the US. See Google docs @ graphistry.com/careers
Likewise, if you use a SIEM/Splunk/Neo4j/SQL today and want a better experience for it, feel free to ping for the early access program. You can see our Nvidia GTC talk on the GPU SOC for types of experiences we are building in general. GPT 3 already enabled way easier experiences here, and then GPT 4's quality jump shifted it from feeling working with a weirdly well-read 10yr old to working more with a serious colleague.
Serious question. How can you reconcile needing CLEARED individuals to perform the work but give the data to a non-cleared entity who seems to have issues with security?
Perhaps there’s now a self hosted or enterprise version where they promise not to leak it?
We work with everyone from individual university researchers trying to understand cancer genomes or European economic plans in their graph DBs, to big corporations struggling with supply chains in Databricks, to government cyber & fraud teams using Splunk. For many, an OpenAI/Azure LLM is fine, or with specific guard rails they've been having us put in.
But yes, when talking with banking & government teams, the conversation is generally more around self-hosted models. Privacy + cost both important there -- there is a LOT of data folks want to push through LLM embeddings, graph neural nets, etc. We generally prefer bigger contracts in the air-gapped-everything world, especially for truly massive data, though thankfully, costs are plummeting for LLMs. Alpaca/Dolly are great examples here. Some folks will buy 8-100 GPUs at a time, so this is no different for those. My $ is on continuing to shrink LLMs down to regular single-GPU being fine for many scenarios. The quality jump of GPT4 has been amazing, so it's use case dependent: data cleaning seems fine on smaller models, while we love GPT4 for deeper analyst enablement. Wait 6mo and it's clear there'll be ~OSS GPT4, and for now, even GPT3.5 equivs via Alpaca-style techniques are interesting, a lot of $ has begun moving around.
LLM side is new from a use case perspective but not as much from an AI sw/hw pipeline view. Just "a bigger bert model". A lot of discussions with folks has been extrapolating with them based on what they're already doing with GPUs, where it's just another big GPU model use case. Internally to us, as product team doing a lot of data analyst UX & always-on GPU AI pipeline work... a very different story, its made what was already a crazy quarter even that much more nuts.