This is truly a brilliant hack, well worth publishing at Hacker News.
He was working out the proof to a problem in college and got stuck half way through. He had the problem and the answer, so he worked it from both ends and got stuck in the middle. He had to present the process in class, so he went to the professor for help.
His professor (who had done it originally) couldn't remember how anymore, and told him when he got to that point, he should say "and from this, it's obvious that..." and just jump to the next step.
That's exactly what he did, and no one in class (half hour into the class) even noticed.
(The Feynman method—described by someone who observed him, though, not the man himself).
Yes, it is. It was intended to require as little silicon as possible to minimize the cost to the transistor budget. SPI doesn't contemplate power supply, hot-plug, discovery, bit errors, or any other of a host of affordances you get with USB.
I think there is some value for software developers to understanding SPI and the idioms used by hardware designers with SPI. Typically, SPI is used to fill registers of peripherals: the communication is not the sort of high level, asynchronous stuff you typically see with USB or Ethernet and all the layers of abstraction built upon them. Although there is no universal standard for SPI frames, they do follow idiomatic patterns, and this has proven sufficient for an uncountably vast number of applications.
You couldn't make it quite as simple as (a given flavor of) SPI, but something close to I²C should be feasible.
That'd be https://en.wikipedia.org/wiki/System_Management_Bus
Well, okay, I guess SMBus ARP kind of does. Thanks!
If you simply remove this restriction, bit-banging USB would become trivial, even with all the other protocol complexity.
Though, I think USB made the right call here. The requirement to support any clock speed the device requested would add a lot of complexity to both hosts and hubs.
Only supporting a few fixed clock-rates makes certification and inter-device compatibility so much easier, which is very important for an external protocol. Supporting bit-banging just isn't that important of a feature, especially when the fixed clock rates really are that hard to implement on dedicated silicon.
In V-USB, usbdrv/.[ch] contains 1440 unique lines. The bitbanging stuff is mostly in .S *.inc, so correct me if I'm wrong, but I think this is roughly the non-timing-related complexity imposed by USB protocol stack. (This division is not perfect, because there are things in e.g. usbdrvasm.S which have nothing to do with bitbang timing, but I feel like it's a reasonable approximation.) The remaining complexity in, say, examples/hid-mouse/firmware/main.c is only a few dozen lines of code.
And that's a USB device. Implementing a USB host is at least another order of magnitude more complexity.
You definitely don't need 1000+ lines of code to implement, say, the PS/2 mouse protocol. From either side.
So, while I agree that a lot of the difficulty of bitbanging USB results from its tight timing constraints, I don't agree that what's left over is "trivial".
I would love a protocol you outline, but could you use SPI as the physical layer and put the rest on top?
I believe your confidence in the 16-bit CRC is excessive. There is a 1 in 65536 chance of a 16 bit CRC failing for certain types of corruption in 512 byte bulk USB packets, and there are about 2 billion packets in a 1TB transfer. If the BER is high, corruption of the transfer is not surprising.
A 6ft cable should be fine, assuming it is well designed, manufactured correctly, in good condition, and not in close proximity to high noise sources, such as SMPS. If any of those factors are compromised the BER will increase, and you will then be testing the rather limited capabilities of 16 bit CRC.
USB4 has 32 bit CRC for data payloads for a reason. In the mean time, the #1 thing you can do is use short, high quality cables.
Glad to see that USB4 is fixing things but one thing I worry about is if this 32 bit CRC is a mandatory part of the standard that cannot be turned off or ignored by manufacturers. Especially since it apparently is not transparent to users.
You could hypothetically use a really-really high bandwidth oscilloscope (like 2 GHz to view 480 MHz USB HS signals), but those are expensive. So you would have to resort to using external USB sniffer...out of curiosity I found someone made a sniffer that is basically a USB-capable microcontroller plus an FPGA and a USB PHY: https://hackaday.com/2023/06/13/cheap-usb-sniffer-has-wiresh...
The reality is that the simple protocols like SPI and I2C just are not good enough. They aren't fast, the single-ended signal scheme makes them very sensitive to noise, and there is no error correction. These protocols make sense and work extremely well for their intended purpose: connecting ICs on a PCB. If you expose an unterminated port to the outside world, all bets are off.
These protocols and variations thereof are still in heavy use in modern PCs. But they're internal busses, as the protocols intend.
I haven't looked closely at the USB spec, but I imagine the main problem with bit-banging is simply the speed required. You have to have dedicated hardware because no microcontroller is fast enough to toggle the pins while also running the software stack to decode the protocol and manage error correction.
You can run into this exact problem bit-banging I2C. With a 20MHz CPU, the maximum clock speed you can get is about 250KHz. Just a bit more than half the typical maximum rate of 400KHz. You can absolutely forget about the 1MHz version.
PHYs exist for one very good reason: it is vastly cheaper to offload comms protocols to hardware. Without that, you have to over-spec your CPU by quite a lot to get enough resources to manually manage communication. This is why every modern microcontroller contains hardware for I2C, SPI, serial, etc.
In summary, the simple serial protocols like SPI and I2C and UART are just absolutely terrible choices for external peripherals. They can't operate at reasonable speeds, they can't tolerate long cables, they can't tolerate noise. The nature and design of these protocols (excepting RS232 which is not UART) means that they cannot be used this way. There's no change to the spec you could make to support this without reinventing USB.
(In my original comment I should have said to use differential signaling for going off-board.)
This was a bit of a surprise when I started, but then I realised that many installations are decades old, with components having been replaced individually.
Googling for "CH570" produces results about tractors. Got a link?
EDIT: found info here: https://www.cnx-software.com/2025/04/02/10-cents-wch-ch570-c...
8-pin part lacks USB AND only has 3 I/O pins. It would be disqualified due to being too I/O-poor. Wasting 5 pins out of 8 is a joke!
As for the old one, CH32V003: 48MHz is slower than the STM's 150MHz, half the flash, 1/4 the RAM. It is still not the best option.
I did update the article with them, though :)
But you get radio (BLE in the CH572 version), which means you don't need USB.
My comment was not that you didn't choose them but that you didn't consider them.
I just considered them and added them to my writeup :)
I'm more of a software guy. I guess my biggest hardware project (other than building a 3D printer from someone else's plans) was in 2012 or so making a PID-based home heating controller using an Uno with a custom perfboard shield with a thermistor for measuring the air temperature and a 433 MHz transmitter for talking to a set of Jaycar 240V remote controlled power outlets to control primarily a 2400W oil column heater, plus a fan pointing at it which I turned on when it was at a high duty cycle. Plus switching the home water heating on and off on a fixed schedule just as a side thing from the same board. I used that in Wellington for 3 winters before moving to Russia for a few years and saved NZ$500 per winter on electricity bills compared to previous years using a mechanical Honeywell thermostat to control the heater.
I was actually thinking of trying to make a product, especially once Flick Electric started up with electricity prices changing every 30 minutes based on the wholesale rate plus a 2c margin. You could (still can) get the current 30 minute wholesale price at your local substation from electricityinfo.co.nz with a simple http query, so you can build something to make intelligent decisions.
But then I got a job overseas and dropped the project...
I think it's compatible with the old nRF24 chips -- I'll test when mine arrives in a week or so. The CH572 version has BLE5 ... I think the same hardware but including a software stack.
In that sense it's like SPI, or perhaps more like CAN or SD: when you don't understand it, you reach for someone else to have done it for you, but you can choose to understand it and once you understand it you can implement it.
If you're the slave you have tight timing requirements but you only have to respond with certain fixed bit patterns to certain other bit patterns. If you're the master, you can do more things concurrently because the slave won't notice a little jitter in how often you poll it, but you have the problem of dealing with a wider variety of slaves that can be connected.
But there is more complexity on higher layers. USB HID (mice and keyboards) is often the first you'd want but it is special in that it allows a device to describe its own packet format in a tokenised data description language. The device only has to send an additional blob when asked, but the host has to parse the contents of that blob and use the result to parse the device's packets.
And of course, every time there is complexity in a protocol and there are multiple implementations of it, there is more opportunity for them to be incompatible in very subtle ways. This phenomenon has caused for example that some gaming keyboards with N-key rollover that work perfectly on MS-Windows without any special drivers have been rejected outright by Apple or Linux hosts. (I hope these issues have been fixed now, but I'm not sure).
Keyboard vendors catered to the lowest common denominator because it was essential that users be able to enter the PC BIOS at boot.
BTW. Apple chiclet keyboards (before the "Magic") got an interested workaround to this problem for its proprietary Fn key. It uses a variation of the boot protocol but only a 5-byte array (5KRO). When the Fn key is pressed, the sixth byte will contain a code that is otherwise an error code if interpreted as the boot protocol.
the name is escaping me
One thing: you might want to mention the required board thickness (0.8mm, iirc?) for people planning to have their own boards made.
Edit, explanation for others: that is required to make the "USB-C edge connector" fit the plug.
Hell, allwinner v3s is hand solderable and has built in RAM and will happily boot Linux natively
Rp2350 would also be an excellent choice. It has a very good QSPI ram interface with cache built-in and usb support.
Total pin count is so low on this, I'm very tempted to make a dead bug version.
https://www.bunniestudios.com/blog/2013/on-hacking-microsd-c...
IMHO, it is doesn't matter for novice, what to solder, SOIC8 or SOIC28. SOIC28 is as easy (or hard, if you want) as SOIC8.
And larger chip could make much more useful computer: it will be possible to add some minimal sound (as such chips typically have DAC), keyboard, and, maybe later, true monitor output in VGA style (not DP or HDMI of course).
It will be not much harder (if at all) to solder, but could be good base for expansion if owner gain interest in such things.
The clip will give you a electrical connection to the SPI flash but I'm not sure you'll be able to talk to it without jumpers on the board. Is it possible without jumpers?
I haven't tried it so I don't know how well it works
Do you mind elaborating?
It was designed late enough in history to have taken advantage of a lot of available information. None of it was taken advantage of. Which is why a lot of extensions are now being proposed to actually fix things that should’ve been done right in the first place. With all the additions, it is slowly approaching sanity, only 10 years later. And I don’t buy the excuses that the learning process needed to happen. All the information was available all along, and the mistakes were obvious to basically all of us all along
Some of the extensions are only Band-Aids for the real design issues. Eg shadd2 is a bandaid for not having proper addressing modes for accessing arrays. A common refrain to answer this is composed of promises of magical instruction fusion in the core. This is often promised, but never delivered. Certainly not in the cheap kind of processors that are the only target for RISC – V. Not having instructions for a bitfield extraction and insertion is also an amateur mistake. That’s why there are extensions to fix that one too. But it should’ve been obvious from the beginning that it would be necessary. A conditional branch based on a bit in a register is another obvious thing that should’ve been considered from the very beginning as it is commonly encountered. Any analysis of modern software would’ve shown this.
What annoys me is the information was available. We know what sorts of things modern software does. It was all ignored. Instead, we got a slightly updated mips-1. And now with all the extensions, it’s fragmentation galore. You can either target the final result (RV23, I think is the name), which is somewhat sensible, but no hardware implements, or you can target the least common denominator, which will run everywhere, shittily
There are other, more serious, design issues when it comes to attempting to use RISC – V for actual high-performance computing. I’ll save those for another rant.
At approximately the same point in history, another instruction set was designed. It actually took advantage of all the knowledge available about what modern software looks like, and it shows: aarch64.
RVA23 is the name. And I hear this is what e.g. Windows, Android and the next Ubuntu LTS target.
>or you can target the least common denominator, which will run everywhere, shittily
Not as shitty as you make it to be. And it has a huge advantage to aarch64, in its simplicity. Easily an order of magnitude, allows it to be used in scenarios aarch64 could only dream of.
>At approximately the same point in history, another instruction set was designed
I get it, you really like aarch64.
Which one weighted its options better? You might be right, but as years pass we'll have the benefit of hindsight. We'll be able to look back and see whether either side had good choices or cursed ones.
It's going to be fun. Hasn't been this fun since the 90s.
For small cheap stuff, sure, RV is ok. But so was AVR/ARMv6M/ARMv7M.
Compare : Apple M4 perf vs $your_favourite_rv_”fast”_core perf per MHz
It makes sense for micro controllers, but not for big cores.
This whole approach of trying to be everything for everyone is one of the reasons that RISC – V ends up being mediocre for everyone and perfect for no one.
We might get a meaningful comparison like that if Qualcomm starts making RISC-V based SoC's as a hedge against ARM. Or if Tenstorrent comes up with a M4-like CPU design. I think the jury is very much still out as to whether the rather limited variable-sized insns (2 or 4 bytes) of RISC-V + the compressed insn extension is a genuine concern. It's certainly nothing like the chaos you see with x86-64 (which seems to be a real bottleneck for very wide decode), and a lot closer to something like the old ARM32+Thumb2.
Your argument is wistful thinking. Not fact. “Well, if maybe someone does it” isn’t a fact. Maybe someone will make a 8501 that outperforms my M4. But I won’t believe it till it is done.
And indeed it is close to thumb2. Which was purposefully rejected for aarch64. By careful study. Given that between the two, aarch64 looks to be much better thought-through, I’ll be giving the credit for making the right decision here to that team too.
Or why not just go full native....grab some MIPS-core IP and make your own with an FPGA?
And no Linux runs on cortex-m0 with ram attached over SPI.
And MIPS is the easiest Linux-compat architecture to emulate.
ADD R0, SP, PC, ROR SP
is entirely valid, even if nonsensical, instruction. But you must translate all valid inputs, else you risk breaking things. That may be a contrived example, but here is a common one: if one has a jumptable of relative offsets somewhere, pointed to by R10, even this is valid: LDR R0, [R10, R0, LSL #2]
ADD PC, R10, R0
That gets messy to translateAs I understand it, this kind of thing was a big problem for ARM in the mid-90s when they finally wrote the ARM ARM and outlawed things like ldmia r2!, {r0-r4}.
I think I get what OP means but you can definitely order a pc kit and just assemble it nowadays
Like imagine that you wanted to deploy something on a lot of huge devices, well by using something like this open source and really limited but (with it just works), you can actually have pcb providers build it and ship it in their warehouse / wherever and just provide it energy and ethernet and now you can probably ssh into it / even create some sort of Vercel-like UI on top of it(Coolify/Dokploy? though we would need to slim down docker a lot for Dokploy? )
and when the work is done , they would actually scrape the metal /pcb and / reuse it again...
I am not sure if such metal /pcb recycling makes sense...
If anyone technical can respond to this, it would be great.
This can also be done with risc-v as well. I am not sure but I was thinking of creating a very dead simple company (my brain and its weird thoughts... , also Don't copy me or if you do, then hire me xD) which just can take a device like old phones and then just root them using AI? / manual or maybe not even root it? IDK...,
Basically then providing them internet access and energy (Not a traditional warehouse) because you actually only pay for the one time fees and afterwards all the fees that you pay, they are of the real costs bore by the company operating it / no middle man profits.
Kind of like a "Costco" (oh I had forgotten name of Costco and I had to search target alternatives xD) where they actually are there to help you save money but for you to use their services you gotta have a card.
Love it.