Viserion is new to me though, that looks really cool.
For instance, it kept thinking the tree in my back yard is a person. I find it hilarious that it often assigns a higher likelihood the tree is a person than me! I've needed to put a mask over the tree as a last resort.
It looks like either Frigate or Viseron will do what I want. I started setting up Frigate, but realized I should downgrade my Reolink Duo 3 to a Duo 2 before I go too far. The Duo 3 really doesn't offer much better image quality but forces you to use h265 and consumes a lot more bandwidth. Once I stabilize my camera setup I'll get back to setting up both Frigate and Viseron and see what performs better. I like that the pro upgrade of Frigate allows you to customize the model and may make use of that.
For "edge" or embedded applications, an accelerator such as the Google Coral Edge TPU is a useful reference point where it is capable of up to 4 Trillion Operations per Second (4 TOPS), with up to 2 Watts of power consumption (2 TOPS/W), however the accelerator is limited to INT8 operations. It also has around 8 MB of memory for model storage.
Meanwhile a general purpose or gaming GPU can support a wider range of instructions, single-precision, double-precision floating point, integer, etc).
Geforce GTX 1060 for example: 4.375 TFLOPS (FP32) @ 120W (https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb....)
There are commercial-oriented products that are optimized for particular operations and precision.
Here's a blog post discussing Google's 1st-generation ASIC TPU used in its datacenters: https://cloud.google.com/blog/products/ai-machine-learning/a...
(92 TOPS @ 700 Mhz - 40W)
https://coral.ai/docs/accelerator/datasheet/
It does not have VRAM as it is not a graphics card :)
There are examples and instructions for exporting Yolo variants to run on the Edge TPU: https://docs.ultralytics.com/guides/coral-edge-tpu-on-raspbe...
https://hailo.ai https://www.raspberrypi.com/news/raspberry-pi-ai-kit-availab...
res = rest(ollama, {
"model": "llava",
"prompt": genprompt(box.name),
"images": [box.export()],
"stream": False
})
They are calling the ollama API to run Llava. Llava is a combo of an LLM base model and + vision projector (clip or ViT), and is usually around 4 - 8GB. Since every token generated needs access to all of the model weights, you would have to send 4 - 8 GB through USB with the Coral. Even at a generous 10gbit/s that is 8GB / 1.25GB = 6.4seconds per token. A 150 (short paragraph) generation would be 16minutes.How many parameters is the model you are using with hailo? And what’s the quantisation and what model is it actually ?
- "person": "get gender and age of this person in 5 words or less",
- "car": "get body type and color of this car in 5 words or less".
So YOLO gives the bounding box and rough category, while llava describes the object in more details.
Some things that matter when it comes to configuring your IP Cameras (Beyond security, etc): - Support for RTSP - Configurable Encoding Settings (e.g. h264 coded, bitrate, i-frame intervals, framerate) - Support for Substreams (i.e. a full-resolution main stream for recording, and at least one lower-resolution substream for preview/detection/etc) ...
Make sure the hardware you select is capable of the above.
Configurability will matter because Identification is not the same as Detection (Reference: "DORI" - Detection, Observation, Recognition, and Identification from IEC EN62676-4). If you want to be able to successfully identify objects or entities using your cameras, it will require more care than basic Observation or Detection.
"On November 25, 2022, the Federal Communications Commission (FCC) released new rules restricting equipment that poses national security risks from being imported to or sold in the United States. Under the new rules, the FCC will not issue new authorizations for telecommunications equipment produced by Huawei Technologies Company (Huawei) and ZTE Corporation (ZTE), the two largest telecommunications equipment manufacturers in the People’s Republic of China (PRC).
The FCC also will not authorize equipment produced by three PRC-based surveillance camera manufacturers—Hytera Communications (Hytera), Hangzhou Hikvision Digital Technology (Hikvision), and Dahua Technology (Dahua)—until the FCC approves these entities’ plans to ensure that their equipment is not marketed or sold for public safety purposes, government facilities, critical infrastructure, or other national security purposes. The FCC did not, however, revoke any of its prior authorizations for these companies’ equipment, although it sought comments on whether it should do so in the future."
If your budget supports commercial style or commercial grade cameras, looking at Dahua or Hikvision manufactured cameras would be a good starting point to get an idea of specs, features, and cost.
US - FCC Ban The US Federal Communications Commission (FCC) banned Dahua and Hikvision from new equipment authorizations in November 2022. Most products that use electricity require FCC equipment authorizations; otherwise, they are illegal to import, sell, market, or use, even for private individuals. Jul 5, 2024
You’d have to buy from actual Western companies like Axis or Dallmeier.
Compromised firmware or other backdoors are a concern for a wide range of products. With IP Cameras, a commonly recommended practice includes putting them on a non-internet accessible network, disabling any remote access, UPnP type features, etc. You can run IP cameras in an air-gapped configuration as well.
Home/consumer-grade cameras have plenty of shortcomings too.
https://www.amazon.in/s?k=cctv+system+4+channel
so what are your options? i have been contemplating getting a door phone + cctv for my home for the past so many years but problems like these prevent me from investing into an ecosystem.
edit: oh. looks like pager attacks has their attention now.
https://trak.in/stories/pager-bombs-govt-can-ban-chinese-cct...
i guess time will tell and then there is lobbying so yeah
IPVM did all the legwork on this a while ago and unconvered that, not that surprisingly, two and a half OEMs (including Dahua and Hikvision) are manufacturing essentially every not-completely-garbage CCTV camera coming out of china, and a bunch that very explicitly claimed to not come out of china.
I’ve never implemented this kind of object persistence algo - is this a good approach? Seems naive but maybe that’s just because it’s simple.
This is the first time that I've seen a "complete" setup. Any info to learn more on applying YOLO and similar models to real time streams (whatever the format)?
There's a reason why there's a whole family of models from tiny to huge.
You really need to have a thread consuming the frames and feeding them to a worker that can run on its own clock.