1、附录 A Real-time object recognition using local features on a DSP-based embedded system Abstract In the last few years, object recognition has become one of the most popular tasks in computer vision. In particular, this was driven by the development of new powerful algorithms for local appearance base
2、d object recognition. So-called smart cameras with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. Recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects i
3、s a matter of public safety. This simply makes the deployment of recognition capabilities on embedded platforms necessary. In our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a DSP-based embedded system. We implement several powerful algor
4、ithms for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. We carefully investigate the parameters of the algorithm with respect to the pe
5、rformance on the embedded platform. We show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources. Keywords DSP ; Object recognition; Local features; Vocabulary tree 1. Introduction Ob
6、ject recognition is one of the most popular tasks in the field of computer vision. In the past decade, big efforts were made to build robust object recognition systems based on appearance features with local extent. For such a framework to be applicable in the real world several attributes are very
7、important: insensitivity against rotation, illumination or view point changes, as well as real-time behavior and large-scale operation. Current systems already have a lot of these properties and, though not all problems have been solved yet, nowadays they become more and more attractive to the indus
8、try for inclusion in products for the customer market. In turn, recently embedded vision platforms such as smart cameras have successfully emerged, however, only offering a limited amount of computational and memory resources. Nevertheless, embedded vision systems are already present in our everyday
9、 life. Almost everyones mobile phone is equipped with a camera and, thus, can be treated as a small embedded vision system. Clearly this gives rise to new applications, like navigation tools for visually impaired persons, or collaborative public monitoring using millions of artificial eyes. In addit
10、ion, the low price of digital sensors and the increased need for security in public places has led to a tremendous growth in the number of cameras mounted for surveillance purposes. They have to be small in size and have to process the huge amounts of available data on site. Furthermore, they have t
11、o perform dedicated operations automatically and without human interaction. Not only in the field of surveillance, but also in the areas of household robotics, entertainment, military and industrial robotics, embedded computer vision platforms are becoming more and more popular due to their robustne
12、ss against environmental adversities. Especially DSP-based embedded platforms are very popular as they are powerful and cheap CPUs, which are still small in size and efficient in terms of power consumption. As DSP offer the maximum in flexibility of the software to be run, compared to other embedded
13、 units like FPG As, ASIC or GPU, their current success is not surprising. For the reasons already mentioned, recognition tasks are a very important area of research. However, in this respect some attributes of embedded platforms strictly limit the practicability of current state-of-the-art approache
14、s. For example, the amount of memory available on a device strictly limits the number of objects in the database. Therefore, for building an embedded object recognition system, one goal is to make the amount of data to represent a single object as small as possible in order to maximize the number of
15、 recognizable objects. Another important aspect is the real-time capability of these systems. Algorithms have to be fast enough to be operational in the real world. They have to be robust and user-friendly; otherwise, a product equipped with such functionality is simply unattractive to a potential c
16、ustomer. For example, in an interactive tour through a museum, object recognition on a mobile device has to be fast enough to allow for continuity in guidance. Formally speaking, we consider this to be an application requiring soft real-time system behavior. Clearly, this is just one example, and th
17、e exact meaning of the term real-time is dependent on the concrete application. We still consider an object recognition system as being real-time capable, if it is able to deliver at least one result per second. This already serves enough for many applications like the example of the interactive mus
18、eum introduced above. However, it is clear that this definition does not meet other applications, and that an improvement in throughput is needed for object recognition at frame rate, for instance in combination with object tracking. To summarize, building a full-featured recognition system on an em
19、bedded platform turns out to be a challenging problem given all the different aspects and environmental restrictions to consider. In this work, we describe a method to deploy a medium sized object recognition system on a prototypical DSP based embedded platform. To the best of our knowledge, we are
20、the first to extensively investigate issues related to object recognition in the context of Embedded Systems; by now this is the only work studying the influence of various parameters on recognition performance and runtime behavior. We pick a set of high-level algorithms to describe objects by a set
21、 of appearance features. As a prototypical local feature based recognition system we use difference of Gaussian (DOG) key points and principal component analysis scale invariant feature transform (PCASIFT) descriptors to build compact object representations. By arranging this information in a clever
22、 treelike data structure based on k-means clustering, a so-called vocabulary tree, real-time behavior is achieved. By applying a dedicated compression mechanism, the size of the data structure can be traded off against the recognition performance and thereby accurate tuning the properties of a recog
23、nition system to a given hardware platform can be performed. As it is shown in extensive evaluations by considering both, special properties of the algorithms and dedicated advantages of special hardware, considerable gains in recognition performance and throughput can be achieved. The remainder of
24、this paper is structured as follows. In Sect. 2 we give an overview about developments in both areas that we are bringing together in our work. On the one hand we list a number of references in the context of object recognition by computer vision; on the other hand, we cite a number of publications
25、from the area of embedded smart sensors. A detailed description of the methods involved in building our object recognition algorithm is given in part 3. In Sect. 4 we outline our framework and give details about training and implementation of our system. We closely describe all steps in designing ou
26、r approach and give side notes on alternative methods. In Sect. 5, we experimentally evaluate our system on a challenging object database and discuss real time and real-world issues. Furthermore, we investigate some special features of our approach and elucidate the dependencies of several parameter
27、s on the overall system performance. The work concludes with some final notes and an outlook on future work in Sect. 6. 2. Related Work In the following we will give a short introduction to the topic of local feature based object recognition. Due to the huge amount of literature available, we will focus on the most promising approaches using local features, and refer to those algorithms which are somehow related to our work. We will also give a short overview about