1、 外文资料 Detecting ground shadows in outdoorconsumer photographs Jean-Francois Lalonde, Alexei A. Efros, and Srinivasa G. Narasimhanc School of Computer Science, Carnegie Mellon University Project webpage: http:/graphics.cs.cmu.edu/projects/shadows Abstract. Detecting shadows from images can significan
2、tly improve the performance of several vision tasks such as object detection and tracking. Recent approaches have mainly used illumination invariants which can fail severely when the qualities of the images are not very good, as is the case for most consumer-grade photographs, like those on Google o
3、r Flickr. We present a practical algorithm to automatically detect shadows cast by objects onto the ground, from a single consumer photograph. Our key hypothesis is that the types of materials constituting the ground in outdoor scenes is relatively limited, most commonly including asphalt, brick, st
4、one, mud, grass, concrete, etc. As a result, the appearances of shadows on the ground are not as widely varying as general shadows and thus, can be learned from a labelled set of images. Our detector consists of a three-tier process including (a) training a decision tree classifier on a set of shado
5、w sensitive features computed around each image edge, (b) a CRF-based optimization to group detected shadow edges to generate coherent shadow contours, and (c) incorporating any existing classifier that is specifically trained to detect grounds in images. Our results demonstrate good detection accur
6、acy (85%) on several challenging images. Since most objects of interest to vision applications (like pedestrians, vehicles, signs) are attached to the ground, we believe that our detector can find wide applicability. 1. Introduction Shadows are everywhere! Yet, the human visual system is so adept at
7、 filteringthem out, that we never give shadows a second thought; that is until we need to deal with them in our algorithms. Since the very beginning of computer vision, the presence of shadows has been responsible for wreaking havoc on a wide variety of applications, including segmentation, object d
8、etection, scene analysis, stereo, tracking, etc. On the other hand, shadows play a crucial role in determining the type of illumination in the scene 1, 2 and the shapes of objects that cast them 3. But while standard approaches, software, and evaluation datasets exist for a wide range of important v
9、ision tasks, from edge detection to face recognition, there has been comparatively little work on shadows in the last 40 years. Approaches that use multiple images 4, time-lapse image sequences 5,6or user inputs 79 have demonstrated impressive results, but detecting shadows reliably and automaticall
10、y from a single image remains an open problem. This is because the appearances and shapes of shadows outdoors depend on several hidden factors such as the color, direction and size of the illuminants (sun, sky, clouds), the geometry of the objects that are casting the shadows and the shape and mater
11、ial properties of objects onto which the shadows are cast. Most works for detecting shadows from a single image are based on computing illumination invariants that are physically-based and are functions of individual pixel values 1014 or the values in a local image neighborhood 15. Unfortunately, re
12、liable computations of these invariants require high quality images with wide dynamic range, high intensity resolution and where the camera radiometry and color transformationsare accurately measured and compensatedfor. Even slight perturbations (imperfections) in such images can cause the invariant
13、s to fail severely (see Fig. 4). Thus, they are ill-suited for the regular consumer-grade photographs such as those from Flickr and Google, that are noisy and often contain compression, resizing and aliasing artifacts, and effects due to automatic gain control and color balancing. Since much of curr
14、ent computer vision research is done on consumer photographs (and even worse-quality photos from the mobile phones), there is an acute need for a shadow detector that could work on such images. Our goal is to build a reliable shadow detector for consumer photographs of outdoor scenes. While detectin
15、g all shadows is expected to remain hard, we explicitly focus on the shadows cast by objects onto the ground plane. Fortunately, the types of materials constituting the ground in typical outdoor scenes are (relatively) limited, most commonly including concrete, asphalt, grass, mud, stone, brick, etc
16、. Given this observation, our key hypothesis is that the appearances of shadows on the ground are not as widely varying as the shadows everywhere in the scene and can be learned from a set of labelled images of real world scenes. This restriction by no means makes the problem trivial: the ground sha
17、dow detector still needs to contend with myriad other non-shadow visual manifestations such as markings and potholes on the roads, pavement/road boundaries, grass patterns on lawns, etc. Further, since many objects (pedestrians, vehicles, traffic signs, etc) of interest to vision applications, are a
18、ttached to the ground and cast shadows onto the ground, we believe such a ground shadow detector will find wide applicability. 1.1 Overview Our approach consists of three stages depending on the information in the image used. In the first stage, we will exploit local information around edges in the
19、image. For this, we compute a set of shadow sensitive features that include the ratios of brightness and color filter responses at different scales and orientationson both sides of the edge. These features are then used with a trained decision tree classifier to detect whether an edge is a shadow or
20、 not. The idea is that while any single feature may not be useful for detecting all ground shadows, the classifier is powerful enough to choose the right features depending on the underlying edge region. In order to make the classifier robust to non-shadow edges, a negative training set is construct
21、ed from a set of edges not on the ground and those arising due to road markings, potholes, grass/mud boundaries, etc. Surprisingly, this simple procedure yields 80% classification accuracy on our test set of images randomly chosen from Flickr and LabelMe 16. In the second stage, we enforce a groupin
22、g of the shadow edges using a Conditional Random Field (CRF) to create longer contours. This is similar in spirit to the classical constrained label propagation used in mid-level vision tasks 17. This procedure connects likely shadow edges, discourages T-junctions which are highly unlikely on shadow
23、 boundaries, and removes isolated weak edges. But how do we detect the ground in an image? For this, in the third stage, we incorporate a global scene layout descriptor within our CRF, such as the 3-way ground-vertical surface-sky classifier by Hoiem et. al 18. Since the scene layout classifier is t
24、rained on the general features of the scene and not the shadows, we are able to reduce the number of false-positive (non-shadow) detections outside the ground. Our results show that the shadow detection results improve by 5% with this step. We demonstrate successful shadow detection on several image
25、s of natural scenes that include beaches, meadows and forest trails, as well as urban scenes that include numerous pedestrians, vehicles, trees, roads and buildings, captured under a variety of illumination conditions (sunny, partly cloudy, overcast). Similarly to the approach of Zhu et al. 19, our
26、method relies on learning the appearance of shadows based on image features, but does so by using full color information. We found that using color features and incorporating knowledge of the ground location improve classification results as much as 10% on our test set. While our technique can be us
27、ed as a stand-alone shadow detector, we believe it can also be tightly integrated into higher level scene understanding tasks. 2 Learning local cues for shadow detection Our approach relies on a classifier which is trained to recognize ground shadow edges by using features computed over a local neighborhood around the edge. We show that it is indeed possible to obtain good classification accuracy by relying on local cues, and that it can be used as a building block for subsequent steps. In this section, we describe how to build, train, and evaluate such a classifier.