Due to the widespread use of high-definition monitoring, the amount of data involved in security surveillance has increased dramatically in a short time. Efficient collection, analysis and application of data and the intelligent use of it are becoming ever more critical in this industry. Thus, improving video intelligence appears to be an inevitable, industry-wide goal. Security users hope that their investment in new products will bring even more benefits beyond simply tracing and tracking persons of interest and evidence collection after a security event. Instead, finding ever more efficient ways to allow surveillance to shift from post-incident tracing to alerts during incidents—or even pre-incident alerts is keenly needed. In order to satisfy these demands, new technologies are required. Intelligent video surveillance has been available for many years. However, the outcomes of its application have not been ideal. The emergence of deep learning has enabled these demands to become reality.
The insufficiency of traditional intelligent algorithms
Traditional intelligent video surveillance has especially strict requirements for a scene’s background. The accuracy of intelligent recognition and analysis in comparable scenarios remains inconsistent.
The features in traditional intelligent algorithms are designed by humans and have always been heavily subjective. More abstract features—those that humans have difficulty comprehending or describing—are inevitably missed. In classification learning process, as the number of available categories for classification rises, so does the difficulty level.
Traditional intelligent algorithms generally use shallow learning models to handle situations with large amounts of data in complex classifications. The analysis results are far from ideal. Furthermore, these results directly restrict the breadth and depth of intelligent applications and further development. Hence the need for increasing the “depth” of intelligence in big data for the security industry is arising.
The advantages of deep learning and its algorithms
Traditional intelligent algorithms are designed by humans. Whether or not they are designed well depends greatly on experience and even luck, and this process requires a lot of time. So, is it even possible to get machines to automatically learn some of the features? Yes! This is actually the objective of artificial intelligence (AI).
The inspiration for deep learning comes from a human brain’s neural networks. Our brains can be seen as a very complex deep learning model. Brain neural networks are comprised of billions of interconnected neurons; deep learning simulates this structure. These multi-layer networks can collect information and perform corresponding actions. They also possess the ability for object abstraction and recreation.
Deep learning is intrinsically different from other algorithms. The way it solves the insufficiencies of traditional algorithms is encompassed in the following aspects.
First, from “shallow” to “deep”
The algorithmic model for deep learning has a much deeper structure than the two 3-layered structures of traditional algorithms. Sometimes, the number of layers can reach over a hundred, enabling it to process large amounts of data in complex classifications. Deep learning is very similar to the human learning process, and has a layer-by-layer feature abstraction process, it takes a partial understanding (shallow) to an overall abstraction (deep) where we can perceive the object.
Second, from “artificial features” to “feature learning”
Deep learning does not require manual intervention but relies on a computer to extract features by itself. This way it is able to extract as many features from the target as possible, including abstract features that are difficult or impossible to describe. The more features there are, the more accurate the recognition and classification will be.
Key factors of deep learning
In total, there are three main reasons why deep learning only became popular in recent years and not earlier: the scale of data involved, computing power, and network architecture.
Improvements in data-driven algorithm performance have accelerated deep learning in various intelligent applications in a short amount of time. Specifically, with the increase in data scale, algorithmic performance improved as well. Accordingly, user experience has improved and more users are involved, further facilitating a larger scale of data.
Focusvision has operated in the security industry for many years with its own research and development capabilities, employing large amounts of real video and image data as training samples. With a large amount of good quality data, and over a hundred team members to label the video images, sample data with millions of categories have been accumulated.
Furthermore, high performance hardware platforms enable higher computational power. The deep learning model requires a large amount of samples, making a large amount of calculations inevitable. The rapid development of GPUs, supercomputers, cloud computing, and other high performance hardware platforms has allowed deep learning to become possible.
Finally, the network architecture plays its own role in advancing deep learning. Through constant optimization of deep learning algorithms, better target-object recognition can be achieved. For more complex applications such as facial recognition or in scenarios with different lighting, angles, postures, expressions, etc., network architecture will impact the accuracy of recognition.
Application of deep learning products
In the past two years, deep learning technology has excelled in speech recognition, computer vision, voice translation, and much more. It has even surpassed human capabilities in the areas of facial verification and image classification; hence, it has been highly regarded in the field of video surveillance for the security industry, namely in facial detection, vehicle detection, facial recognition, human body feature detection, multiple target tracking etc.
These types of intelligent functions require a series of front-end surveillance cameras, back-end servers and other products which support deep learning algorithms. In small scale applications, front-end cameras can directly operate structured human and vehicle feature extraction, and tens of thousands of human facial images can be stored within the front-end devices to implement direct facial comparison, so as to reduce costs of communicating with a server. In large scale applications, front-end cameras can work with back-end servers. Specifically, the structured video task is handled by front-end devices, reducing the workload for back-end devices; matching and searching efficiency of back-end servers improve as well.
Deep learning is the next level of AI development. It is beyond machine learning where supervised classification of features and patterns are set into algorithms. Focusvision is developing this concept in its own analytics algorithms. Enhanced accuracy is the result of multi-layer learning and extensive data collection. Application of this algorithm into face recognition, vehicle recognition, human recognition, and other platforms will significantly advance the performance of analytics.