I love being a DM but it is no mystery to me. Being a DM is a lot of work. And occasionally, a lot of annoyance. You not only have to facilitate (and sometimes create from scratch) an engaging story that keeps people coming back, you have to wrangle busy schedules and personalities that is often annoying. Lots of times you'll have a bunch of people willing to play, but no one willing to DM. So to me it's not surprising at all, and I experimented with AI in running a game, and quickly concluded it could never be good at it because it's so insanely suggestible and has poor memory.
I would love to have an AI as an assistant Dungeon Master (or game master, or Keeper, or what have you). That is, one person in a group of players maintains the role of a master storyteller, but the AI is ready to fill in details or suggest ways to get the players back on track. This would probably be tedious if you're interacting with the LLM entirely through text, and having to manually keep it up to date with the story. But it could work well if you have a model that understands spoken language listening in on the game and generating cool images and making private suggestions to the game master.
As a player and very occasional DM myself, filling in details and trying to get players back on track is where is the fun and challenge. AI could definitely be useful to handle all the paperwork (fight resolution and so on) though
However I also agree with you that being a DM is a prohibitive amount of work for someone, say, with kids and a job. It would be awesome to have an LLM as an assistant, maybe feeding in parts of the story and querying it for ideas when you're in a bind. But having it run as a full DM, at least right now, will likely lead to a boring experience.
I will run Toon, Call of Cthulhu or Paranoia any day, the latter without a scenario as I can kill off their all their clones in my very dangerous Alpha Complex and have them rolling on the floor laughing the whole time with a help of a stack of prerolled character sheets and some random encounter tables. (I'd expect an LLM to be able to do the same)
Contrast that to the famous Bloodstone campaign which is the pinnacle of D&D scenarios but can't really be challenging to the players because players have to win over and over again if you're going to use most of the material.
There are numerous tactics that work to face the "not serious" problems you have in a game for "young adults".
For instance, serious players get resentful when somebody not serious comes in late, forgets their character sheet, etc. You could lose either or both of them, but with a little prep you don't have to. In Paranoia it's easy to shove a preroll into their hands, give them a 1-2 minute briefing from "The Computer" and build up tension around this character who mysteriously appears (and if the new player complains that they don't know the rules tell them they're not allowed to know the rules!)
Toon is the epitome of "not serious" and, like the other two games I've mentioned, is a game where you can buy one book and have everything you need, including your first scenario.
1) "LLM DM" isn't merely mean substituting for human DM in a D&D session. The same capability can be used as a component in a video game, to breathe life into the game world, have it interactively react and evolve along with the player.
EDIT: Take a game like Rimworld, that relies on a scripted RNG dubbed a "storyteller", to decide what random events to hit you with and how hard. It's fun early on, but if you're into role-playing, you'll quickly realize there's no evolving story behind it, just stateless RNG. An LLM DM is exactly what could add that story, make overcoming challenges feel meaningful and allow for player decisions to actually impact the world deeply.
2) There are people like me, who would love to participate in an RPG session, but for various reasons never got invited to those when at school, and now, due to demands of parenthood, can't exactly make time to coordinate with the few people around who are still playing.
There are more, but those are the two that are apparent to me.
No it's not. I don't think you're going to find an LLM with a large enough context window to have a meaningfully involving story spanning multiple sessions.
An LLM isn't going to craft a story element tailored to a character, or more importantly, an individual player. It's not going to understand Sam couldn't make last week's session. An LLM also doesn't really understand the game rules and isn't going to be able to adjudicate house rules based on fun factor.
LLMs can be great tools for gaming but I think their value as a game master is limited. They'll be no better a game master than a MadLibs book.
First, you don't need much of any context window because you can finetune the LLM. Don't mistake specific engineering choices and tradeoffs and deployment convenience for intrinsic limitations of the technology.
Second, LLMs like Gemini now have context windows of millions of tokens, corresponding to millions of words. Seems like enough for 'multiple sessions'.
> An LLM isn't going to craft a story element tailored to a character, or more importantly, an individual player. It's not going to understand Sam couldn't make last week's session. An LLM also doesn't really understand the game rules and isn't going to be able to adjudicate house rules based on fun factor.
An LLM can do all of that, and you definitely do not know that they can't.
> They'll be no better a game master than a MadLibs book.
They've been better than a Madlibs book since AI Dungeon 1 which was like 6 years ago.
Sure you will.
> An LLM isn't going to craft a story element tailored to a character, or more importantly, an individual player.
Sure it is.
> An LLM also doesn't really understand the game rules and isn't going to be able to adjudicate house rules based on fun factor.
Sure it will.
You need to use the tools for their purpose, not for the opposite of it. LLMs have finite context, you need to manage it. LLMs don't have a built-in loop, you need to supply it.
Character stats, names, details about players - those are inputs, and structured ones at that. LLMs shouldn't store them - that's what storage media are for, whether in-memory or a database or a piece of paper. Nor should they manipulate them directly - that's what game systems are for, whether implemented in code or in a rulebook run on a human DM. LLMs are to make decisions - local, intuitive decisions, based on what is in their context. That could be deciding what a character says in a given situation. Or how to continue the story based on worldbuilding database. Or how to update the worldbuilding database based on what it just added to the story. Etc.
Some details about players are structured and can be easily stored and referenced. Some aren't. Consider a character who, through emergent gameplay, develops a slight bias against kobolds; who's going to pick up on that and store it in a database (and at what point)? What if a player extemporaneously gives a monologue about their grief at losing a parent? Will the entire story be stored? Will it be processed into structured chunks to be referenced later? Will the LLM just shove "lost a father" into a database?
Given current limitations I don't see how you design a system that won't forget important details, particularly across many sessions.
LLM might, if prompted to look at it, or if there was a defining moment that could invoke such change. It won't pick on a very subtle change, but then most people reading a story wouldn't either - this is more of the kind of stuff fans read into a story when trying to patch potential continuity issues.
> What if a player extemporaneously gives a monologue about their grief at losing a parent? Will the entire story be stored? Will it be processed into structured chunks to be referenced later? Will the LLM just shove "lost a father" into a database?
The scale depends on the design, but I'd say yes, shoving "lost a father" into a database so it pops up in context is a good first step, the next step would be to ensure the entry looks more like "mentioned they continue to grieve after loss of their father <time ago>", followed by a single-sentence summary of their monologue.
Personally, I had some degree of success with configuring LLM (Claude 3.5 Sonnet) for advising on some personal topics across multiple conversations - the system prompt contains notes in <user_background> and <user_goals> tag-delimited blocks, and instructions to monitor the conversation for important information relevant to those notes, and, if found, to adjust those notes accordingly (achieved by having it emit updates in another magic tag, and me manually apply them to the system prompt).
> Given current limitations I don't see how you design a system that won't forget important details, particularly across many sessions.
It's not possible. Fortunately, it's not needed. Humans forget important details all the time, too - but this is fine, because in storytelling, the audience is only aware of the paths you took, not of the countless other possibilities you missed or decided not to take. Same with LLMs (and larger systems using LLMs as components) - as long as they keep track of some details, and don't miss the biggest, most important ones, they'll do the job just fine.
(And if they miss some trivia you actually care about, I can imagine a system in which you could ask about it, and it'll do what the writers and fans always do - retcon the story on the fly.)
Rimworld is a great universe where we think about characters' stories, and there's really just like, a couple dozen attributes and straightforward ways they interact with the game.
An LLM context window could easily have 20 times as much interpersonal state, and make it interact in much more unexpected (but plausible) ways. That's going to be a surprising and rewarding gaming experience once someone figures it out.
I have a feeling people imagine LLMs as end-to-end DMs that should somehow remember everything and do everything in one inference round. That's not what they are. They're building blocks. They're to be chained and mixed with classical flow control, algorithms, and data storage (as well as the whole interactive game mechanics, in videogame context).
You don't need to provide every single previous information to llm, use LLM to summarise previous ones and it gets really compact. It works quite well.
Sure.
There are two reason why I can think of someone making a Dungeon Master LLm.
One is that when there is no cake we eat bread.
Don't get me wrong. I'm a DM, and I love playing d&d with my friends. I totally agree with the sentiment you are sharing. But people who are willing, and able to DM for others are not evenly distributed. There is a lot more people who would like to play TTRPGs than people who is willing to step into the DM role.
So in that sense think of this as a substitutions for those games which would not happen otherwise, because they don't have a DM. Or the only person who would DM them is an ass. Or their DM has burnt out.
Is some game better than no game? Sometimes. Depends on how good that game is. And we won't know how good the substitute can be without trying. Heck maybe more people will play, and they will realise how easy is to DM actually.
The other reason is the sheer challenge of it. Dnd has a lot of rules. There are the obvious ones you can read in the book. But there are also un-written rules. Like object-permanence. If a goblin steals a diamond ring from us, and we slay them within minutes they damn well have the diamond ring on their bodies somewhere. If three displacer beasts ambush us and we slay 2 there damn well be 1 more accounted for in some sense.
There are also "story-writing" rules. If we went through hell and high-water to obtain an arrow of dragon slaying after the blacksmith told us about the legend of it, he better not just pull one out of his ass the next time we see him. If the whole lore of goddess X is that they are kind and caring then they should better be at least not cruel when we meet them. These are all hard for an LLM. They are also easy for a human to evaluate. Because we just feel when they are not right. Therefore it is a good challenge to evaluate how good we are at this "making a bucket of sand smart" task.
Now imagine something in between, where you have a video game, but the NPCs and parts of the story are controlled by an LLM. You can give the players much more freedom, and the creators don't have to write thousands of lines of dialog to account for every possible choice a player can make. The game doesn't have to be "on rails" so much, the LLM can help speed the story along when the player gets confused, and you can have NPCs with much more depth than just several lines of static dialog.
How well this will work in practice remains to be seen. In my experience, ChatGPT itself is a painfully generic DM that relies upon repetitive fantasy tropes. You still need an actual human being to create an interesting story and add depth to the world.
Here's a list of some of the ones available (about 10 years old but it gives you the idea)
https://rpggeek.com/geeklist/181957/list-of-solitaire-soloab...
I grew up with “choose your own adventure” books which were like a solo adventure: (If you go on the north road turn to page 34 if you follow the river page 57)
Many board games now have a solo mode with an automated player with tables and dice to help randomness.
https://boardgamegeek.com/thread/3238902/what-is-the-best-so...
Obviously there are adventure video games. Some of the new ones often have interesting back stories, really amazing world building and dnd like adventures. Horizon zero dawn, the Witcher, Star Wars outlaws among many others. “baldurs gate” really gave me dnd vibes (I played through that one multiplayer).
Totally this, at least for me, in some circumstances.
TOU's be damned, I've written bots for some online games I've played, not because I want the XP or money or whatever that the bot could do without me working for it, but rather because I found writing the bots fun and engaging.
Before anyone gets in an uproar, I didn't sell them, nor any of the in-game resources gained from them. I was watching them basically all the time, because that's what I was there for - to see my creation work. And I purposely didn't interfere with other "real" players.
Citation needed.
> A purely AI generated or controlled world would have no constraints
That's a shitty AI then. Make a better one. I can play 2000 Vampire: The Masquerade games with 2000 different groups. They will each be different, but they will also be each distinctly Vampire: The Masquerade ttrpg games. If the AI you are thinking about can't do the same, then think of a better AI.
> at least with current technology.
Well. Who is the group who will make the "next technology"? Should we work on that, or just lay down on the ground and wait for it to fall from the sky? Testing what are the limits of the current technology (as done in the paper we are talking about here) is the way to get there. Or at least to systematically answer the question of where and what are we lacking.
Lol, a citation of what? This is my opinion statement and the rest of my post follows it.
> That's a shitty AI then. Make a better one. I can play 2000 Vampire: The Masquerade games with 2000 different groups. They will each be different, but they will also be each distinctly Vampire: The Masquerade ttrpg games. If the AI you are thinking about can't do the same, then think of a better AI.
Sure, I'll get right on that.
> Well. Who is the group who will make the "next technology"? Should we work on that, or just lay down on the ground and wait for it to fall from the sky? Testing what are the limits of the current technology (as done in the paper we are talking about here) is the way to get there. Or at least to systematically answer the question of where and what are we lacking.
I'm really unsure of what or who you are addressing here but it certainly isn't anything I've written in my post.
Citation that nobody wants what you described? The sentence which I was quoting.
> This is my opinion statement
Your opinon can be that "I don't want this." "nobody wants this" is refers to things outside of your head. Do you see the difference between the two?
> it certainly isn't anything I've written in my post.
Your post is suffused with defeatism. The 3 "no"s it contains are: "nobody wants this", "with current technology this cannot be fun" and an implicit "we can't make the next technology". I believe each of those are wrong, and I'm calling you out on the attitude.
Happily. I know you are wrong on the "nobody wants this" statement because I want it. With a sweeping generic statement like "nobody wants this" a single example is enough to disprove it. There you have it.
> what advances are being made in AI technology in this gaming that lends you such confidence?
There is a ton of experimenting going on. AI Dungeon and Deep Realms are the two obvious ones. I don't think anyone has found the golden solution yet, but that is also not evidence that no such thing exists.
Now it's true that, with the current crop of LLMs, a persistent enough player would always be able to break through them. But if it takes conscious and deliberate effort, I think it's reasonable to say that whatever experience the person gets as a result, they were asking for it.
Its the type of problem which requires a good balance of storytelling, preparation, and tailoring the interactions towards the psychology of the people who are playing (in specific beneficial and entertaining ways).
Its a problem computers won't likely solve anytime soon.
Also, I ran a session over Zoom and their AI summary was so useful!
And there's quite a lot of us that like to play but because of life commitments, getting together with a group on a regular schedule is difficult.
Most likely use of this is that a DM runs it, and overrules/augments its output when necessary/as they see fit.
Go outside. There's people for everything.
If you have a good role playing group, with a DM you enjoy playing with, no computer game nor LLM will replace that. But if what you are looking for is the convenience of computer games with the freedom of tabletop RPGs more than the social aspect, then LLMs totally make sense. And even with a social group, it can work as substitute if you don't have a DM (DMing is hard work!).
I DM a few games of The Expanse, and using LLMs to plan ahead was a godsend. No, I didn't utilise it to write the story for me - instead I used it to test my planned story and see which way my players might strafe off the road, so I can plan for those. Basically simulated a game using an LLM that acted in place of multiple characters, allowed those to run free (within certain limits, obviously you can't have the LLM players do a dozen actions without the DM having a say), and essentially mapped the potential "off branches", story pathways I didn't plan for initially. This has allowed me to be prepared for the usual dumbass things players might do, such as heading for a strip club in the middle of a total disaster where they (figuratively) have dozens of arrows pointing to the goal of the chapter.
Another interesting aspect of using AI for TTRPGs is to create atmospherics. For Expanse based games, I've bought a number of tile packs and such, to appreciate the artists who put work into it, but I simply don't have the funds to commission a few dozen acrylic matte style scenery images (which I usually put up on my projector, combined with some Hue lights to create the visual atmosphere). With AI, I can even generate them on the fly, should my little gremlins stray off the path. Same for music - AI can incredibly easily generate an atmospheric soundtrack that fits the current scenery, with just a few words, while I can still pay attention to the players.
But fully replacing the DM? That's silly.
A few takeaways:
1. An LLM based DM can give the player essentially infinite richness and description on anything they ask for.
2. There is difficulty in setting the rules for the LLM to follow that match the DnD rulebook. But this is possible to solve for. Also, I found the LLM to be too pliable as a DM. I kept getting my way, or getting my hand held through scenarios. Maybe this is a feature?
3. My conversation quickly began to approach the context window for the LLM and some RAG engineering is very necessary to keep the LLM informed about the key parts of your history.
4. Most importantly, I found that I most enjoy the human connection that I get through DnD and an LLM with a voice doesn't really satisfy that.
LLMs are fine-tuned to be "helpful assistants", so they're basically sycophantic.
It starts well and then NPCs become inconsistent and the DM basically lets you craft the story by constantly doing a "yes and".
It becomes boring because the stakes feel so low.
Assuming we're talking about GPT-4o, that 128k context window theoretically corresponds to somewhere around 73,000 words. People talk at around 100 words per minute in conversation, so that would be about 730 minutes of context, or about 12 hours. The Gemini models can do up to 2 million tokens of context... which we could extrapolate to 11,400 minutes of context (190 hours), which might be enough?
I would say GPT-4o was only good up to about 64k tokens the last time I really tested large context stuff, so let's call that 6 hours of context. In my experience, Gemini's massive context windows are actually able to retain a lot of information... it's not like there's only 64k usable or something. Google has some kind of secret sauce there.
One could imagine architecting the app to use Gemini's Context Caching[0] to keep response times low, since it wouldn't need to re-process the entire session for every response. The application would just spin up a new context cache in the background every 10 minutes or so and delete the old one, reducing the amount of recent conversation that would have to be re-processed each time to generate a response.
I've just never seen RAG work particularly well... and fitting everything into the context is very nice by comparison.
But, one alternative to RAG would be a form of context compression... you could give the LLM several tools/functions for managing the context. The LLM would be instructed to use these tools to record (and update) the names and information of different characters, places, and items that the players encounter, important events that have occurred during the game, as well as information about who the current players are and what items and abilities those players have, and then the LLM would be provided with this "memory" in the context in place of a complete conversational record. The LLM would then just receive (for example) the most recent 15 or 30 minutes of conversation, in addition to that memory.
> I found the LLM to be too pliable as a DM.
I haven't tried using an LLM as a DM, but in my experience, GPT-4o is happy to hold its ground on things. This isn't like the GPT-3.5 days where it was a total pushover for anything and everything. I believe the big Gemini models are also stronger than the old models used to be in this regard. Maybe you just need a stricter prompt for the LLM that tells it how to behave?
I also think the new trend of "reasoning" models could be very interesting for use cases like this. The model could try to (privately) develop a more cohesive picture of the situation before responding to new developments. You could already do this to some extent by making multiple calls to the LLM, one for the LLM to "think", and then another for the LLM to provide a response that would actually go to the players.
One could also imagine giving the LLM access to other functions that it could call, such as the ability to play music and sound effects from a pre-defined library of sounds, or to roll the dice using an external random number generator.
> 4. Most importantly, I found that I most enjoy the human connection that I get through DnD and an LLM with a voice doesn't really satisfy that.
Sure, maybe it's not something people actually want... who knows. But, I think it looks pretty fun.[1]
One of the harder things with this would be helping the LLM learn when to speak and when to just let the players talk amongst themselves. A simple solution could just be to have a button that the players can press when they want, which will then trigger the LLM to respond to what's been recently said, but it would be cool to just have a natural flow.
I tried to get it to run space opera about a diplomatic mission to a newly recontacted planet, full of Game of Thrones-style intrigue and plotting and scheming, and it was like playing with a cheerily optimistic new UN intern that completely believes in the power of compromise and diplomacy to solve all problems.
It was always like: "And so the two of you hashed out your differences over a glass of Dubranian Forblik, and peace reigned forever and ever after."
That's true of how it does role playing games and any kind of fiction at all -- it always wants to tie a neat bow on it at the end of every few paragraphs with a moral and everything.
You also have to ask it to use python to dice roll for anything you want to potentially fail on, or it will always make everything you do succeed wonderfully.
I will say this -- the advanced audio mode for D&D is amazing and it really does act out the characters and it will occasionally slip into different voices or even add sound effects, even though it's not "allowed" to. It will also sometimes copy your voice and act out what it thinks you should do instead of letting you talk, though.
> raised by 3 out of 7 players was the perceived lack of danger for their characters. Participants noted that the gameplay did not present sufficient threats or challenges, which diminished the sense of urgency and excitement typically associated with D&D adventures.
https://huggingface.co/jukofyork/Dark-Miqu-70B
https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_7...
But my problem with using it as a DM is that you have to police yourself. I could just declare that I did something and it would automatically succeed. There were no rules or antagonism in the system. It was more of a collaborative storytelling tool than a game.
To be fair I haven't used it in years so I dunno how it works nowadays.
Sure, it's a cool technical demo. But... the point of D&D is a social game played with others. Even if you play a D&D-like video game such as Baldur's Gate 3 (which I'd argue is a fundamentally different experience to playing a tabletop RPG anyway), you're experiencing a world and a story that someone else has crafted.
What's the point of replacing that social interaction, or that connection to another person's creative vision through their art, with an LLM? What value can it ever provide?
The title of the paper is "Exploring the POTENTIAL of LLM-based agents..." Right now, yeah, just play BG3. But the context of the discussion is its potential, which is certainly worth discussing imo.
To play a tabletop RPG without that isn't the same thing, and to play with LLM is bankrupt of meaning. You'd be better off picking up one of those old-school choose your own adventure books or a video game like BG3. At least those were written by someone with intent, at least there's meaning in the story and setting. Or, finding an online group.
I grew up playing D&D alone in my room using LEGO. It was great and I still have some of the sets I built as dungeons and monsters that I built from reference pictures in the Monster Manual.
Video games are different from solo role playing. It's like trying to substitute shooting hoops in the driveway with NBA '24.
Dear god, it's crucial that adults be capable of creating and maintaining relationships. If it's impossible then, well, that's hell on earth.
Go out and make a new relationship! Do you live in Antarctica? And, if you live somewhere where new relationships can't be made, my god, get out of there! Go, go now! Go somewhere you're allowed to be human!
That's one of the points, or a benefit maybe, but its not the WHOLE point. Some people just enjoy RP and fantasy worlds, so a single-player campaign would still be enjoyable (to me, at least, but maybe I'm weird). I def don't see this replacing DMs in real groups any time soon.
Also, if it doesn’t try to keep the story to strict guardrails, the creativity and story creation would come essentially from the human players.
I don’t think those arguments are enough to make me want to use it, but I see the point and possible interesting use.
"Bad DnD is worse than no DnD"
I'll agree. I've had bad groups and, wow, yeah. I'd rather have just doom scrolled my phone for 4 hours.
So, an AIDnD is wonky now, but it's better than bad DnD (at least with some futzing about that I have done myself).
To really reach here: I think you're placing value on the thing due to the effort involved. That's kind of an old school Marxist way of assigning prices to goods. I'm trying to point out that the value isn't that way for everyone. The 'new' market way of assigning prices is 'whatever someone will pay for it'. And in that very tortured analogy, that means that AIDnD is just as valid if someone likes it enough to do it. The background effort isn't part of the determination of value, necessarily.
With slight nudges, GPT performed well. It excelled at NPC dialog, location description, and to some extent improvisation. However, eventually the campaign fell apart on session four or five. There was a disagreement between the players over exactly how many maps we had acquired. As you know, answering this question is a needle in a haystack problem and asking the LLM proved to illicit conflicting responses. It ultimately led to me concluding that pausing the campaign was the right choice and accepting that we all still had an amazing time up to that point.
Which leads me back to the questions in the paper: > We believe that with the above evaluation methods we can try to answer the following questions: • How consistent is a LLM Agent in its generative dialogue? • How well can a LLM Agent keep a user engaged in a narrative plot? • How creative is a LLM Agent when generating complex story driven narratives?
None of these are exactly relevant to my personal experience running a TTRPG game using ChatGPT and what makes it succeed or fail. What would make such a system an improvement is being able to store and retrieve facts about players (HP, stats, inventory), NPCs(names, location, history), and Places(items, descriptions, inhabitants) in order to solve the needs-haystack problem. Creating a world ontology and being able to CRUD facts against such an ontology would be a game changer(literally).
Despite the ultimate demise of the game, the experiment was a success. I'd highly encourage anyone with a laptop, a handful of dice, and a group of interested friends to give a shot. It's fun, memorable, and a unique way to pass the time and bond.
Curiously, those are the exact things LLMs can't do well, and are not really supposed to. This is all stuff you (or an external system) should keep track of and supply to LLM in context "just in time". This is to say, LLMs are an important component, but alone they won't work. They need their tools :).
In short, LLMs are pure "system 1", the complete solution still needs "system 2", which fortunately is more amenable to classical computing approaches.
Also the way a PC died due to a rat bite that could only be cured by magic (5e of this module is ironically harsher than the 3.5) as he was losing max HP every 24H. As the players expressed dislike with that character the owning player was willing to go for a new character and I had planned a big drama scene where the players exit in emergency the dungeon and try to rally the nearet village where there's a priestessd and happens that I could match the rolls so he would die in the priestess hands.
But I made the mistake to allow them to spend the night to get their 3rd level which would have allowed the druid to get his cure illnesses spell botching the set up.
Happens that I decided to exploit the fact that all other character the previous days made an offering to Beshaba to curse that character and force rolls during night (and try to find a way to make the new character come into play).
How it unfolded : - they were sleeping and it was the bard's turn for watch
- the bard was a follower of Tharizdun and was about to sacrifice the dieing character as said character was agonizing
- the new character was a Kara-turian enforcer that was chasing a yokai (he didn't knew) that used other people as disguise, made it happen that the yokai choose the dieing character "by coincidence" appearance
- the yokai being chased opens a portal while hiding to the dungeon
- the new character litteraly pops through a Wild Magic portal that pushed away the bard and tries to bash the old character as the old character runes on his skin glow a last time as he dies and repels everybody
- made the new character pop naked as I wasn't paying attention to what the player was saying when he was listing his inventory
- I let them handle the big mess
I don't think the results would be as good as they are now - Vedal is bringing his own perspective into it - but once you have a proof of concept, you can improve it.
Was there a structured combat system as sort of this or was it a talking only kind of thing? Who knows?
Trying to get empirical evidence that an LLM can even remotely plausibly work as a DM with no social aspect and (as far as I can tell, reading the paper) no programmed system of stats, rolls, combat etc. seems impressive.
…but was it that? Or just a DnD themed chatGPT conversation? Or a text adventure with an LLM describing each “room”?
Did the LLM actually have agency to make decisions and perform tasks, or was it not an agent at all, just a conversation?
Without a transcript, all you can really say for sure is yeah, I agree; more research is needed.
The methodology also barely explains how they did their baseline experiments with human DMs. Did they do them face-to-face, or via text? Did they have different DMs in the 3 games they used as baseline, or was it the same person? As it stands the research is barely reproducible.
At the same time, I'm personally not super enthusiastic about using LLMs for a pen and paper game. Let's assume for a moment that we have free will, then the most fun aspect of PnP dungeon mastering is the fact that the DM has agency. It adds an (IMO) very fun mini-game where the players try to get away with as much shenanigans as they can, and the DMs job is to try to thwart these shenanigans in the most entertaining ways possible. The keyword is on "try" - the GM has to put entertaining obstacles in front of the players to keep the game fun, but ultimately the players should have an avenue of success. Even if they don't always succeed, they need to succeed enough and when they fail they need to feel that it was fair.
I feel that with an AI DM, this goes out the window. At least with the current crop of AI, I would never feel that I'm facing an agent with it's own free will, rather just a machine. As such, it cannot replace a human DM for the PnP purposes.
I'm a "perpetual DM" in most of my roleplaying groups, having led campaigns spanning years, but very rarely do I get to participate as a player. My kids love me running adventures and one-shots, and while I enjoy it, people don't always recognize how much work it is -- both the prep, as well as the mental effort of being "on" for the entire duration of the adventure. I like afternoon one-shot adventures, but I dislike reading through long sourcebooks. I enjoy crafting puzzles and presenting challenges to the players, but also recognize the mental effort that this takes, and sometimes I just want a copilot to handle the lore and the setting. I already rely on the rules-nerds in my group to act as copilots for the specifics of game mechanics -- I don't have time to read through that many tables and commit them to memory.
Re: datasets, the paper authors used a dataset that they created by scraping Critical Role transcripts -- I doubt they had permission for that. I like contributing to open datasets for training open-source LLMs, and I think it would be really sweet for a playgroup to be able to contribute their play transcripts for such a thing. I would love to see a standard open-source dataset be collected for this, and perhaps even a standardized benchmark for quantifying the performance of particular LLMs in RPG contexts -- rules compliance, needle-in-haystack fact finding, consistency of roleplaying, understanding user intent, challenge creation / resolution, etc.
Of course, I have a "classic" DM that happens to be visiting here on vacation from the dungeons & dragon's multiverse, where his real job is being a DM for visiting tourists. There's a "Shaggy" and his talking dog one can take LSD with when going to Woodstock in '68. There's a StarTrek strategic tactical simulations instructor. I quite like a Underground railroad "conductor" that guides escaped slaves to freedom before the US Civil War.
I think these DM / interactive narrative applications have a huge future in education and training.
Lack of Human and Social Interactions: This is a significant drawback when you look it as the social thing, however, they require scheduling. For me I wanted to venture beyond the typical fantasy/sci-fi settings and good bored of those D&D/warhammer world.
LLM Versatility: LLMs can take you on wild (now non-NSFW) adventures. I wanted to play a D&D-style game set in the Venetian Renaissance and GPT delivered with accurate locations, families, and artists. It can do the same with medieval settings from my country, ancient Rome, surrealist landscapes, and whatever your favorite books, movies, or games involve.
Customization: Want a game master who replies in poetry, slang, in style of you favourite author? No problem.
Context: This is a tough one, but there are two approaches:
Retrieval-Augmented Generation (RAG)
Summarization, and then summarization of the summary. I've used this approach in my previous startup and in this game.
Characters: I utilised the randomness of different character styles. You might meet smart/beautiful/evil/good characters with predefined styles, and the LLM quite effectively supports roleplay.
Cons: Lack of Continuity: The biggest downside is the lack of continuity. Even if I generate a world and scenario, after a few rooms, it doesn't come together like a game would, where previous events feed into current ones, even when I provide again LLM with previous details. I kinda lack big picture design of story. I am working on that
I'm currently running a private beta and looking for co-founders (game/design/coding) in the EU, preferably Poland. Please email me at contact@sentimentscanner.com for more information for beta membership and so one.We have a solution for playing RPGs when you have no DM. It is called a "Video game", you can even play it with your friends.
What I have used LLMs for is helping me prepare for certain things, giving ideas for organizations, quests, characters, etc. . They are pretty good at that, although I would avoid relying on them during actual gameplay.
I was wondering the other day whether there were good services for running tabletop games over AI. I don't know about good, because I haven't tried them, but there are certainly a few products like this today. Given that ChatGPT plus some fine-tuning was comparable to a human GM in this experiment, I'd be interested in another experiment that looked at whether these products (which charge more money) offer any benefit over just using ChatGPT. For them to be better than ChatGPT, it seems like they'd almost have to be better than a human GM to be worth the money.
The main game is a strategic collectible card game, but the “adventure mode” is just for fun AI-powered stories.
I am not really considering mainstream commercialization, but I just found it really fun to interact with. We just need a customized model full of fantasy text and a little DM RLHF. It's essentially just a story-telling llm, which obviously will need some work before consistent narratives can be delivered to the user.
Not an AI expert or anything, just wanted to share my excitement about this. I urge you to try it out, if you're into this kind of thing. Tell ChaptGPT it's a DM, explain the rules, give it some context on how deliver the narrative and player choices. You can give some story or character exposition to give it a baseline. Its not too bad.
I don't have any experience with D&D but wanted to learn it to play it with my kids. I used claude to learn the basic rules and then had it help me create the scenario. Then I played as the dm while it simulated being the players and then vice versa.
Really nice experience
It's a tremendous loss for our species if we lose the impetus to connect. Do your kids a REAL favor, and build them a community—a SOCIETY—that they can join.
I wonder how well it could do for game summaries and recaps?
D&D is a social activity, why would I want to remove the most core human component of a social activity. It'd be like someone who enjoys going to a sports bar with friends deciding they'd rather sit next to a vending machine.