Show HN: Vidformer – Drop-In Acceleration for Cv2 Video Annotation Scripts(github.com)

25 pointsby dominikwin5 months ago5 comments

vulpescana5 months ago
Ooh, this seems interesting ! I'm working on tracklab (https://github.com/TrackingLaboratory/tracklab), I might try to implement this in the visualization pipeline, which needs some work right now.
What can I potentially get out of VidFormer when there are a lot of annotations to show ?
- dominikwin5 months ago
  There are a few benefits, depending on your use case; most of them are reducing video length as a barrier to visualization.
  1) If you are running models you can use vidformer to see the results as they come in, essentially streaming annotated videos to your web browser as your model runs. 2) If you have existing inference results you can practically instantly render those on videos, then you can iterate or remix in seconds. 3) If you're hosting any infrastructure you can expose VOD streams publicly to show annotated videos to web clients. For example, it's trivial to build a video search engine which returns compilations.
- tzm5 months ago
  Thanks for sharing / working on tracklab. Would be cool to see VidFormer implemented.
simlevesque5 months ago
I'm working on a pull request to add it to Yolo right now.
I got this error: `AttributeError: module 'vidformer.cv2' has no attribute 'setNumThreads`
Maybe you could add some noops to make it easier to swap.
Edit: I saw the other functions that aren't implemented yet (https://ixlab.github.io/vidformer/opencv-filters.html)
I had to comment out: fillPoly, polylines and drawContours.
- dominikwin5 months ago
  Happy to help, what do you mean add it to Yolo?
  - simlevesque5 months ago
    Use it in this project: https://github.com/ultralytics/ultralytics
    They do some cv2 monkey patching so it won't be simple.
    dominikwin5 months ago
    Okay, do you mean use it for speeding up media I/O there? I don't know if that would work. Using Yolo and drawing bounding boxes should already work fine with the supervision integration. That's how the Colab notebook does it.
    simlevesque5 months ago
    Yeah to speed up media I/O. The thing is that you're feeding Yolo frame per frame. That's not ideal because you have to reimplement streaming and batching into Yolo. Here's an example of what you can do: https://docs.ultralytics.com/modes/predict/#__tabbed_2_13
    That's giving more control to Yolo as to when it pulls frames and how it processes them. In the Colab example you can't do this.
    I get this error: "No server set for the cv2 frontend. Set VF_IGNI_ENDPOINT and VF_IGNI_API_KEY environment variables or use cv2.set_server() before use."
    I tried to use set_server but I'm not sure what argument it needs.
    dominikwin5 months ago
    Vidformer-py is a thin client library around a vidformer server. Details and install here: https://ixlab.github.io/vidformer/install.html It's possible to embed that into the python library, but getting FFmpeg, OpenCV, Rust, and the Python build systems to all play nice across multiple operating systems is too big a task for me to take on.
    I'm not sure vidformer is a great fit for this task, at least in that way. It's better at creating and serving video results, not so much at processing. However, the data model does allow for something similar. You can take a video and serve a vidformer VOD stream on top, and as segments are requested it can run the model on those segments. Essentially you can run CV models as you watch the video. Some of this code is still WIP though.
5 months ago
undefined
xmichael9095 months ago
import vidformer.cv2 as cv2
Do you have a list of what is supported? I've played around with cv2 quite a bit in python for everything from yolo, to loss of signal, to corrupt frame detection and simple things like 'snow blocking camera'
Ultimately, what is supported in your library? is it *.cv2?
- dominikwin5 months ago
  Yes! There's a list here: https://ixlab.github.io/vidformer/opencv-filters.html
  A good chunk of OpenCV imgproc is implemented, but it can go beyond that. Vidformer can be applied to any function which returns or manipulates frames (transformation), but not functions which return data. So not yolo, but things like canny edge detection would work just fine. It's impossible to accelerate all processing tasks, so we focus on the "video transformation" subset used to create videos.
smashah5 months ago
[flagged]
- 5 months ago
  undefined