"One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error faded Expressiveness is about communicating internal state. But perceived aliveness depends on something else: unpredictability, a certain opacity. This makes sense: living systems track a messy, high-dimensional world. Shoggoth Mini doesn’t.
This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?"
They'd impress you initially but after some experimentation you'd realize they had a basic set of behaviors that were triggered off a combination of simple external stimuli and internal state. (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")
So…
> this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????"
…yes, but also no.
Humans will always seem mysterious to other humans, because we're too complex to be modelled by each other. Basic set of behaviours or not.
As a frequent "your stated reasoning for why llms can't/don't/will-never <X> applies to humans because they do the same thing" annoying commentor, I usually invoke it to point out that
a) the differences are ones of degree/magnitude rather than ones of category (i.e. is still likely to be improved by scaling, even if there are diminishing returns - so you can't assume LLMs are fundamentally unable to <X> because their architecture) or
b) the difference is primarily just in the poster's perception, because the poster is unconsciously arguing from a place of human exceptionalism (that all cognitive behaviors must somehow require the circumstances of our wetware).
I wouldn't presume to know how to scale furbies, but the second point is both irrelevant and extra relevant because the thing in question is human perception. Furbies don't seem alive because they have a simple enough stimuli-behavior map for us to fully model. Shoggoth mini seems alive since you can't immediately model it, but is simple enough that you can eventually construct that full stimuli-behavior map. Presumably, with a complex enough internal state, you could actually pass that threshold pretty quickly.
I find the specifics of that exceptionalism interesting: there's typically a lack of recognition of their own thinking process as having an explanation.
Human thought is assumed to be a mystical and fundamentally irreproducible phenomenon, so anything that resembles it must be "just" prediction or "just" pattern matching.
It's quite close to belief in a soul as something other than an emergent phenomenon.
According to you a video of a human and a human are the same thing. The video is just as intelligent and alive as the human. The differences are merely one of degree or magnitude rather than ones of category. Maybe one video isn't enough, but surely as we scale the database towards an infinite amount of videos, the approximation error will vanish.
But I disagree that my argument doesn't hold here - if I re-watch a Hank Green video, I can perfectly model it because I've already seen it. This reveals the video is not alive. But if I watch Hank Green's whole channel, and watch Hank's videos every week, I can clearly tell that the entity the video is showing, Hank Green the Human, is alive.
I always set voice assistants to a British accent. It gives enough of a "not from around here" change to the voice that it sounds much more believable to me. I'm sure it's not as believable to an actual British person. But it works for me.
As for conlangs: many years ago, I worked on a game where one of the goals was to have the NPCs dynamically generate dialog. I spent quite a bit of time trying to generate realistic English and despared that it was just never very believable (I was young, I didn't have a good understanding of what was and wasn't possible).
At some point, I don't remember exactly why, I switched to having the NPCs speak a fictional language. It became a puzzle in the game to have to learn this language. But once you did (and it wasn't hard, they couldn't say very many things), it made the characters feel much more believable. Obviously, the whole run-around was just an avoidance of the Uncanny Valley, where the effort of translation distracted you from the fact that it was all constructed. Though now I'm wondering if enough exposure to the game and its language would eventually make you very fluent in it and you would then start noticing it was a construct.
FWIW: As a British person, most of TTS British voices I've tested sound like an American trying to put on something approximating one specific regional accent only to then accidentally drift between the accents of several other regions.
But every turn becomes a tight little puzzle to solve, with surprisingly many possible outcomes. Often, situations that I thought were hopeless, do have a favorable outcome after all, I just had to think further than I usually did.
it is very different, but also has the feeling of triumph for each puzzle
In Minecraft, I personally want to progress in a "natural" (within the confines of the game) way, and build fun things I like. I don't want to speedrun to a diamond armor or whatever.
In Crusader Kings, I actually try to take decisions based on what the character's traits tell me, plus a little bit of own characterization I make up in my head.
I think it’s the same reason robot dogs will never take off. No matter how advanced and lifelike they get, they’ll always be missing the essential element of life that makes things interesting and worth existing for their own sake.
the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response.
also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy.
This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint.
It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure...
I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute!
the snap to attention is a good example of it showing you feedback. the frozen state makes me wonder if it is doing anything or not
It was longer. I think almost twice as long. Took about 2 seconds to respond generally, 4 seconds for that one.
That being said he makes some points that alternate limb types could be interesting as well
Just basic interactions with a child plus lessons and a voice would be game changing for the toy world.
> "Teddy, I'm going to break a window.” "No, Davy . . . breaking windows is naughty . . . don't break any windows . . .” "Teddy, I'm going to kill a man.” Silence, just silence. Even the eyes and the arms were still.
> The roar of the gun broke the silence and blew a ruin of gears, wires and bent metal from the back of the destroyed teddy bear.
> "Teddy . . . oh, teddy . . . you should have told me," David said and dropped the gun and at last was crying.
5 hours in: YOU CAN DO IT BEAR, YOU CAN SAVE EVERYONE, ITS WHAT SHE WOULD HAVE WANTED.
I don't doubt someone's gonna invent it, but yikes. Imagine telling kiddo their beloved sentient toy is dead because mum and dad can't afford the ever-rising subscription fees anymore.
edit: when my kid asks for one I'll know it's time to move the family to a cabin deep in the woods.
I know that bad actors will poison the pot, but in general I'd love to see images labelled "AI", "Drawing", "Content Edited", "Colours Adjusted" where appropriate. Cropping is fine.
I'm enthralled about robotics and generative techniques. But let's not quickly confuse them with nature. Not yet.
"I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."
I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.
US has a 1 year grace period. In most countries, any public disclosure makes an idea unpatentable.
Have a look. For instance in the search box enter "categories".
> Search results
> Results for query "(Categories).pn."
> No records found