I wonder why! Most (or all) of customer support calls are recorded. Have you tried (or proposed) to train on that corpus on your Customers premises? You can do multiple evals in that setting - replay user calls into corpus trained ai agent vs generic ai agent and see the difference. Agents can be run on a 24x7 self-test, analysis, adjustment, and reporting loop. Continuously run that loop and compare the prompts of your ai agent vs human operators.
Edit: Grammar
I built a skeleton of an iOS app that managed my calls such that I could choose to answer, decline or send to my chat bot
So it gets real data from all my regular calls and in my state (1 party consent) I don’t need anyone’s permission to record every call. So that data kicks off a fine tuning running that can run overnight or locally to improve my personal model
My plan was to use whisper and a local model with my voice clone and it would talk with everyone I didn’t want to eventually to the point where I don’t ever talk with any person I don’t want to
I would pay you for a local way to do that, however I’d NEVER give you that data - but I’m sure plenty of people would
Most people, avoid phone calls if possible.
If I get a call and it’s an AI, I, like everybody else, is putting down.
If I’m picking up the phone to call a company, it’s because I can’t achieve what I want to on their website.
These AI phone calls are as or more limited than the website.
There is a use-case for voice AI - most of these demoes really miss the mark with “we’re going to replace your call center”.
If founders had any idea how much performance matters in a call center, and how hard it is to achieve, they’d focus on a use case better served by voice AI.
Kinda funny how many amazing CX companies start in Germany!
I’m the CEO & founder of Rime, so I’ve been following your progress with real interest. Feel free to reach out and I’d love to explore ways we might collaborate. Until then, wishing you tons of success on this big milestone!
Or if it can actually parse my words, the next issue is that my issue doesn’t fit into a multiple choice format.
Nothing more frustrating than using AI to gatekeep a human when the AI literally is hung up on receiving an answer it cant understand.
Ive found that pretending to not speak english and making weird sounds gets you through to a human faster than trying to ask the AI to do so.
i always feel with these bots its like way too "polished" in the responses or how it speak. maybe that's a good thing and we are just so used to hearing someone speaking more casually be less well spoken lol. it makes it feels inauthentic, but perhaps that will change over time.
Request a demo button also does nothing other than change the text on success - not sure if it even went through...
P.S. Arkadiy is locked out of his HN account due to the anti-procrastination settings. HN team, can you plz help? :)
In general, we currently have really high success rates with relatively constrained use cases, such as lead qualification and well scoped customer service use cases (e.g., appointment booking, travel cancellation).
In general, voice AI is hard because WYSIWYG (there is no human in the loop between what the bot is saying and what the person on the other side gets to hear). Not sure about legal, but for more complex use cases (e.g., product refunds in retail), there are many permutations in how two different customers might frame the same issue and so it might be harder to accurately instruct the AI agent in a way to guarantee high automation results (given plentitude of edge cases).
It is our belief therefore that voice AI works the best, when the bot is leading the conversation and it is always very clear what the next steps are...
Therefore I think the verticals of customer service and lead pre-qualification make a lot more sense. Since you guys have the numbers, I am curious to learn more about the way you define constraints for the bot and how often calls in these verticals deviate from these constraints.
I'm also curious about your opinions/if you've seen any successful use cases where the bot has to be a bit more "creative" to either string together information given to it or make reasonable extrapolations beyond the information it has.
It thus makes sense why it might not work for legal, since every call there might be high stakes.
Having the bot be "creative" is actually an interesting proposition. We currently do not focus on it, since the majority of our customers want the bot to be predictable and not hallucinate.
feedback (exaggerated): 1. change stage prompt 2. change function description 3. add extra instructions to the end of the context
metrics are easy to generalize (e.g. call transfer rate), but baseline is different for each agent, so we're interpreting only the changes, not the absolute values (in the context of self-improvement).
What sets us apart is multi-stage conversation modeling, out-of-the-box evals, and self-improvement!