LangChain Agent Simulation – Multi-Player Dungeons and Dragons

gareth_untether · on Aug 14, 2023

One of my side projects has been hacking a ChatGPT model and a JS App to make a multiplayer dnd style game. The AI does all the creative descriptions which are saved by the JS app and fed to the players.

It has been a lot of fun working on it because it gave me a taste of new way of working, honing prompts and finding the limits of the AI, then figuring out how to offload the parts it struggled with (mainly storing stats, inventory, regurgitating room descriptions, etc.).

wokwokwok · on Aug 14, 2023

I mean… did you have any success at all?

The “limits of the ai” for this kind of thing is basically everything other than pure dialogue in my experience.

I also had a bit of a play with this, but the LLMs were just rubbish at it.

The only meaningful way of doing this is to write an actual game using actual state, and use the LLM as a “renderer” that renders coherent state into free text, and free texts into (more or less) structured action requests.

Without rails, it just becomes like AI dungeon; free wheeling adhoc story telling with no rules or structure…

Large context windows don’t solve large scale coherence, and prompt engineering does sfa against the devoted trolling efforts of actual players.

jiggawatts · on Aug 14, 2023

I wonder how far you could go with a combination of fine-tuning and a two-pass generate & audit process.

Step 1. Fine-tune a base LLM to the scenario. Feed it as much background material as possible. This would work best for a franchise with a huge extended universe or associated works: Dungeons & Dragons, Star Wars, Doctor Who, etc...

Step 2. Fine-tune / RLHF with negative weights against anything out-of-context. Basically, stop the AI ever referencing anything that can't exist in the fictional universe. Penalise references to real-world events, places, or people, modern technology, etc...

Step 3. Fork the AI model, once for each character. Fine-tune for conversations in that character's "tone" or mannerisms, backstory, etc... These can be via RLHF generated with a powerful model such as GPT 4. Again, reward/penalise the AI if it references anything it should or shouldn't know from that character's perspective.

When players play the game and converse with characters:

1. They'd always be talking to an LLM fine-tuned to death for that specific character in that fictional universe.

2. Then have both the input and output run past a general-purpose "nanny" AI that is prompted to look for exploits, out-of-context shenanigans, or unexpected output. Respond to the user with "I don't understand the strange things you're talking about" or some similar general push-back against jailbreaks.

Alternatively, have the generic security filter AI rewrite inappropriate terms in user inputs with unintelligible garbage. E.g.: if the user asks

    "Are you a computer?"

Rewrite that to:

    "Are you a gizwallop?"

Then the in-game character would rightly be confused by the nonsense term and answer something like:

    "I have no idea what you mean, what is a gizwallop?"

Which could be translated back to:

    "I have no idea what you mean, what is a computer?"

This would be absurdly expensive and slow right now, but in 5 years? 10?

I can imagine the cost of tuning the above for a GPT5-equivalent model dropping down to the budget of even a tiny indie game, let alone big-budget AAA game studios.

sgt101 · on Aug 14, 2023

You've got the pattern for LLM's that I've come to as well - LLM's can decode natural language (into API calls) and can encode them (from state and prompts).

That's all folks!

bfuller · on Aug 14, 2023

As an AI language model, I cannot help your rogue kill that goblin in the cave. Instead, you can try things like: capitalism, finance, technology.

I joke but it is terribly jarring when the API is working perfectly and then starts apologizing that it cannot do something, like access personal information, when it is internally prompted that it should only use information it receives in the prompt.

wokwokwok · on Aug 14, 2023

...mmm, I'm currently in the 'this is a technical limitation, not an artificial constraint' camp.

Sure, the artificially applied constraint in the APIs also exist... and sure, you can have a long context with for example, gpt-3.5-turbo-16k, but the problem is fundamentally that no amount of wishing can make an LLM into a compiler that executes code.

You cannot, and will probably never be able to define your constraints in free text to an LLM of this type and then expect it to also be able to execute those constraints in an error free manner. That's not how the technology works. You might be able to make it generate procedural code that satisfy the constraints and execute that code in a reliable procedural manner, but afaik no one has managed to get that to work reliably and at scale (if you're thinking of smol developer right now, you clearly haven't actually used it).

When you define an RPG system to an LLM, the issue isn't that it isn't allowed to say things, it's just that it cant follow the rules reliably, and it can't keep track of what's going on as the context length gets larger and larger.

...and, for an RPG system, where the contrived RPG rules and internal consistency are everything, it's a deal breaker.

> when the API is working perfectly and then starts apologizing that it cannot do something

eh, use an open LLM. That's 100% not the problem.

upwardbound · on Aug 14, 2023

Would love to check it out even if it's not in a runnable state yet! Do you have a github for it?

xrd · on Aug 14, 2023

I've been experimenting with using local LLMs (Llama of course) and building d&d games for my kids (not multiplayer but we just pass the phone around in the car so each kid can have a turn talking to the DM). It's going surprisingly well, and has been a fun way to explain LLMs and the tech behind them. I wrote up the initial experiences here:

https://blog.katarismo.com/2023-05-26-i-m-a-dad-i-replaced-m...

I'm doing all of it in JavaScript. I'm still not excited about langchain, and the JS version is lagging a lot from the python version. By the way, pocketbase is amazing for something like that.

3abiton · on Aug 14, 2023

You lost me at javascript

xrd · on Aug 14, 2023

OK, let me try to recover. :)

I'm using JS to wrap around the llama.cpp project. Most of the work is done by that. JS is just used to pull the tokens out, normalize them, and then send them to a real-time database (pocketbase). The frontend is a Svelte app.

It's not a lot of code, llama.cpp does almost everything.

ianbicking · on Aug 14, 2023

How do people work without seeing the underlying prompts being sent to the LLM? I think I know what's going on (which probably isn't much?) but at least make it clear what's going on. The details all matter.

Not much here though, just rotating chat with the LLM taking on both sides. The prompts (https://python.langchain.com/docs/use_cases/agent_simulation...) are fine, but not great. No guidance on tension, on actions failing or succeeding, on plot advancement. Nothing to break out of anticipation loops, which are common (when the LLM promises something "is about to happen" but doesn't actually know what and keeps deferring the action).

It probably will work OK because the LLM plays both sides, and does so "fairly", i.e., always in character and never trying to "win". It'll run out of space for the history, but simple pagination might fix it (maybe that's even a LangChain feature?) – it'll drift, maybe dramatically. Or, given the system prompt, it might _not_ drift when appropriate; that is, it might not allow diverging from the original concept, and so not allow consequential action.

justinlloyd · on Aug 14, 2023

Am too also working on a multiplayer D&D Dungeon Master agent, Banderschnappen, though right now it is in a non-playable state as I convert a number of Jupyter Notebooks into more modular and cleaner code. I treat the LLM (ChatGPT in this case) like a rendering engine of the world, and also of the user interface, but ultimately all the heavy lifting of logic is done by a traditional game engine.

avereveard · on Aug 14, 2023

> this won't be easy

> I know that they are afraid of fire

The player side is doing a lot of heavy lifting here in actually directing the game.

startupsfail · on Aug 14, 2023

It is pretty broken. Models are aligned for safety not gameplay.

mattigames · on Aug 14, 2023

I'm 100% sure ever major providers will have at least 2 models in the future: Fun fiction and boring truthful; is just too common of a division to not having them aparts, companies who try to do it with a single model will do much worse, at best they can make it feel like is a single model but internally it just picks one of the two.

avereveard · on Aug 14, 2023

Yeah doing RAG is becoming harder and harder. Had to shift away from some models because they are too much conditioned at using their learned truth and if the provided context shifts significantly from it, it gets ignored.

upwardbound · on Aug 14, 2023

Would anyone who has a throwaway OpenAI key be willing to host this on HuggingFace so people stopping by this thread can play the game? If I have free time in a few hours I'll try.

Ycros · on Aug 14, 2023

Every time I look at LangChain it seems like unnecessary abstraction. The value in this example are the prompts.

zby · on Aug 14, 2023

So what are the alternatives to LangChain that the HN crowd uses?

I see two contenders:

https://github.com/minimaxir/simpleaichat/tree/main/simpleai...

https://github.com/griptape-ai/griptape

There is also the llm command line utility that has a very thin underlying library, but which might grow eventually: https://github.com/simonw/llm

Kiro · on Aug 14, 2023

Just code it yourself. Most of the core logic can be replaced with a function that that inserts some parameters into a string template and calls an API.

bfuller · on Aug 14, 2023

This was the answer for myself as well, pretty cool that we are still at the level where if you have an idea you can build a proof extremely quickly and easily.

mohannadcse · on Aug 14, 2023

I've been enjoying using (and contributing to) Langroid, it's a new multi-agent LLM framework https://github.com/langroid/langroid

ahooda · on Aug 15, 2023

I've been actively contributing to Langroid as well. It is easy to use, and the intuitive design allows for the rapid development of LLM applications, streamlining the whole process. Highly recommended for anyone looking into this space!

lgrammel · on Aug 14, 2023

If you work with JS or TS, check out this alternative that I've been working on:

https://github.com/lgrammel/modelfusion

It lets you stay in full control over the prompts and control flow while make a lot of things easier and more convenient.

ploppyploppy · on Aug 14, 2023

LMQL - https://lmql.ai/

Guidance (microsoft) - almost abandoned - https://github.com/microsoft/guidance

syntaxing · on Aug 14, 2023

How do you know guidance is almost abandoned? Did they announce it?

ilaksh · on Aug 14, 2023

    import openai
    import os

    openai.api_key = os.environ.get('OPENAI_API_KEY')

    def completion(messages):
        response = openai.ChatCompletion.create(
            model = gpt_model, temperature = 0, messages = messages )
        return response['choices'][0]['message']['content'].strip()

    response = completion([
              {"role": "system", "content": "You are a helpful assistant."},
              {"role": "user", "content": "Who won the world series in 2020?"} ])

    #####

    import json
    import tiktoken
    import os

    tokenizer = tiktoken.get_encoding("cl100k_base")
     
    class Message:
        def __init__(self, role, text, length=None):
            self.role = role
            self.text = text
            if length != None:
                self.length = length
            else:
                self.length = self._count_tokens(text)
            print("New message, token length is",self.length)

        def _count_tokens(self, text):
            tokens = tokenizer.encode(text)
            return len(tokens)

    class History:
        def __init__(self, ID=None):
            self.messages = []
            self.ID = ID

            if self.ID:
                self._load_from_json()

        def add(self, role, text):
            message = Message(role, text)
            self.messages.append(message)
            self._save_to_json()

        def _save_to_json(self):
            if not self.ID:
                return

            data = {
                "messages": [{"role": m.role, "text": m.text, "length": m.length} for m in self.messages]
            }
            self.create_dir_if_not_exists('conversations')
     
            with open(f"conversations/{self.ID}.json", "w") as f:
                json.dump(data, f)

        def create_dir_if_not_exists(self, directory_path):
            if not os.path.exists(directory_path):
                os.makedirs(directory_path)

        def _load_from_json(self):
            try:
                self.create_dir_if_not_exists('conversations')
                with open(f"conversations/{self.ID}.json", "r") as f:
                    data = json.load(f)
                    self.messages = [Message(m["role"], m["text"]) for m in data["messages"]]
            except FileNotFoundError:
                pass

        def recent_messages(self, max_tokens):
            recent_messages_reversed = []
            total_tokens = 0

            for m in reversed(self.messages):
                if total_tokens + m.length <= max_tokens:
                    recent_messages_reversed.append({
                        "role": m.role,
                        "content": m.text
                    })
                    total_tokens += m.length
                else:
                    break

            recent_messages = recent_messages_reversed[::-1]

            return recent_messages

upwardbound · on Aug 14, 2023

In your loop:

            for m in reversed(self.messages):
                if total_tokens + m.length <= max_tokens:
                    recent_messages_reversed.append({
                        "role": m.role,
                        "content": m.text
                    })
                    total_tokens += m.length
                else:
                    break

It would be important to change that to not drop system prompts, ever. Otherwise a user can defeat the system prompt simply by providing enough user messages.

ilaksh · on Aug 14, 2023

Good point. The way I use it though is to always add the system prompt to the front after calling that function.