This is the latest, most efficient and robust iteration of a convolution neural network for object detection from images or video frames. What makes it different from the previous iterations is that, YOLO can classify multiple classes of objects from a single frame with relatively less compute. This model can be used to implement detection of seat vacancy in restaurants, crowd detection etc. The presented demo application can detect many objects in an image like chairs, bottles, screens etc.
The sample application is implemented using a computer vision library (open cv) in Python. The camera feeds in live video frames which are iterated through to operate on individual frames. The individual frames which essentially are images, are fed into the YOLO model with pre trained weights, to classify multiple objects in the single frame. The network then labels each object in the image, alongside bounding boxes to give an idea of the object boundaries.
A YOLO model is built by training a convolutional neural network on thousands of annotated image data.
Human activity detection
The proof of concept of this idea is in the stage of development. The implementation borrows from a combination of multiple ideas of pose detection using PoseNet and frame classification using a convolution neural network. PoseNet is an existing model, trained on images of human body points, that can detect human poses and plot the body points on a frame. While CNN is an image classification model. Combining the two networks can plausibly detect human activity in real time. If the resulting model meets a certain accuracy, it can be used to detect anomalies in human activity such as a brawl in a pub.