This book will get you hands-on with a wide range of intermediate to advanced projects using the latest version of the framework and language, OpenCV 4 and Python 3. This updated second edition will guide you through working on independent hands-on projects that focus on essential OpenCV concepts such as image processing, object detection, image manipulation, object tracking, and 3D scene reconstruction, in addition to statistical learning and neural networks.
The book will help you further build on your skills by demonstrating how to recognize traffic signs and emotions on faces. What you will learn Generate real-time visual effects using filters and image manipulation techniques such as dodging and burning Recognize hand.
You'll begin with concepts such as image filters, Kinect depth sensor, and feature matching. As you advance, you'll not only get hands-on with reconstructing and visualizing a scene in 3D but also learn to track visually salient objects.
Later, you'll understand how to align images, and detect and track objects using neural networks. By the end of this OpenCV Python book, you'll have gained hands-on experience and become proficient at developing advanced computer vision apps according to specific business needs.
What you will learn Generate real-time visual effects using filters and image manipulation techniques such as dodging and burning Recognize hand gesture. Score: 5. Follow Us! Latest Books. Articulate Storyline Essentials 18 June Beginning SharePoint Development 18 June Beginning SharePoint 18 June WindowManager As we have seen, OpenCV provides functions that cause a window to be created, be destroyed, show an image, and process events.
Rather than being methods of a window class, these functions require a window's name to pass as an argument. Since this interface is not object-oriented, it is inconsistent with OpenCV's general style. Also, it is unlikely to be compatible with other window or event handling interfaces that we might eventually want to use instead of OpenCV's.
For the sake of object-orientation and adaptability, we abstract this functionality into a WindowManager class with the createWindow , destroyWindow , show , and processEvents methods.
As a property, a WindowManager class has a function object called keypressCallback, which if not None is called from processEvents in response to any key press. However, we could modify WindowManager to support mouse events too. For example, the class's interface could be expanded to include a mouseCallback property and optional constructor argument but could otherwise remain the same.
With some event framework other than OpenCV's, we could support additional event types in the same way, by adding callback properties. This implementation improves on the base WindowManager class by properly handling quit events—for example, when the user clicks on the window's close button. Potentially, many other event types can be handled via Pygame too. Applying everything — cameo.
Cameo Our application is represented by a class, Cameo, with two methods: run and onKeypress. On initialization, a Cameo class creates a WindowManager class with onKeypress as a callback, as well as a CaptureManager class using a camera and the WindowManager class. When run is called, the application executes a main loop in which frames and events are processed.
As a result of event processing, onKeypress may be called. In the same directory as managers. VideoCapture 0 , self. This is the intended behavior, as we pass True for shouldMirrorPreview when initializing the CaptureManager class. So far, we do not manipulate the frames in any way except to mirror them for preview.
We will start to add more interesting effects in Chapter 3, Filtering Images. Summary By now, we should have an application that displays a camera feed, listens for keyboard input, and on command records a screenshot or screencast. We are ready to extend the application by inserting some image-iltering code Chapter 3, Filtering Images between the start and end of each frame.
Optionally, we are also ready to integrate other camera drivers or other application frameworks Appendix A, Integrating with Pygame , besides the ones supported by OpenCV. Our goal is to achieve artistic effects, similar to the ilters that can be found in image editing applications, such as Photoshop or Gimp.
As we proceed with implementing ilters, you can try applying them to any BGR image and then saving or displaying the result. To fully appreciate each effect, try it with various lighting conditions and subjects. By the end of this chapter, we will integrate ilters into the Cameo application. Thus, we should separate the ilters into their own Python module or ile. Let's create a ile called filters. We need the following import statements in filters. It should contain the following import statements: import cv2 import numpy import scipy.
Channel mixing — seeing in Technicolor Channel mixing is a simple technique for remapping colors. The color at a destination pixel is a function of the color at the corresponding source pixel only. More speciically, each channel's value at the destination pixel is a function of any or all channels' values at the source pixel.
In pseudocode, for a BGR image: dst. Potentially, we can map a scene's colors much differently than a camera normally does or our eyes normally do. By assigning equal values to any two channels, we can collapse part of the color space and create the impression that our palette is based on just two colors of light blended additively or two inks blended subtractively.
This type of effect can offer nostalgic value because early color ilms and early digital graphics had more limited palettes than digital graphics today. As examples, let's invent some notional color spaces that are reminiscent of Technicolor movies of the s and CGA graphics of the s.
So we need to specify value or whiteness as well. This color space resembles Technicolor Process 1. This color space resembles CGA Palette 1. For color images, see the electronic edition of this book. Blue and green can mix to make cyan.
By averaging the B and G channels and storing the result in both B and G, we effectively collapse these two channels into one, C. To support this effect, let's add the following function to filters.
The source and destination images must both be in BGR format. Blues and greens are replaced with cyans. Pseudocode: dst. Using split , we extract our source image's channels as one-dimensional arrays. Having put the data in this format, we can write clear, simple channel mixing code. Using addWeighted , we replace the B channel's values with an average of B and G.
The arguments to addWeighted are in order the irst source array, a weight applied to the irst source array, the second source array, a weight applied to the second source array, a constant added to the result, and a destination array. Using merge , we replace the values in our destination image with the modiied channels. Note that we use b twice as an argument because we want the destination's B and G channels to be equal. Similar steps—splitting, modifying, and merging channels—can be applied to our other color space simulations as well.
Our intuition might say that we should set all B-channel values to 0 because RGV cannot represent blue. However, this change would be wrong because it would discard the blue component of lightness and, thus, turn grays and pale blues into yellows. Instead, we want grays to remain gray while pale blues become gray. To achieve this result, we should reduce B values to the per-pixel minimum of B, G, and R.
Let's implement this effect in filters. Blues are desaturated. To desaturate yellows, we should increase B values to the per-pixel maximum of B, G, and R. Here is an implementation that we can add to filters. Yellows are desaturated. By design, the three preceding effects tend to produce major color distortions, especially when the source image is colorful in the irst place.
If we want to craft subtle effects, channel mixing with arbitrary functions is probably not the best approach. Curves — bending color space Curves are another technique for remapping colors. Channel mixing and curves are similar insofar as the color at a destination pixel is a function of the color at the corresponding source pixel only. However, in the speciics, channel mixing and curves are dissimilar approaches.
With curves, a channel's value at a destination pixel is a function of only the same channel's value at the source pixel. Moreover, we do not deine the functions directly; instead, for each function, we deine a set of control points from which the function is interpolated. We will use cubic spline interpolation whenever the number of control points is suficient. Most of this work is done for us by a SciPy function called interp1d , which takes two arrays x and y coordinates and returns a function that interpolates the points.
As an optional argument to interp1d , we may specify a kind of interpolation, which, in principle, may be linear, nearest, zero, slinear spherical linear , quadratic, or cubic, though not all options are implemented in the current version of SciPy.
Let's edit utils. The array must be ordered such that x increases from one index to the next. Typically, for natural-looking effects, the y values should increase too, and the irst and last control points should be 0, 0 and , in order to preserve black and white. Note that we will treat x as a channel's input value and y as the corresponding output value.
For example, , would brighten a channel's midtones. Note that cubic interpolation requires at least four control points. If there are only two or three control points, we fall back to linear interpolation but, for natural-looking effects, this case should be avoided. However, this function might be expensive. We do not want to run it once per channel, per pixel for example, , times per frame if applied to three channels of x video.
Fortunately, we are typically dealing with just possible input values in 8 bits per channel and we can cheaply precompute and store that many output values. Then, our per-channel, per-pixel cost is just a lookup of the cached output value.
The lookup values are clamped to [0, length - 1]. The applyLookupArray function works by using a source array's values as indices into the lookup array. Python's slice notation [:] is used to copy the looked-up values into a destination array. What if we always want to apply two or more curves in succession? Performing multiple lookups is ineficient and may cause loss of precision.
We can avoid this problem by combining two curve functions into one function before creating a lookup array. The arguments must be of compatible types. Note the use of Python's lambda keyword to create an anonymous function. Here is a inal optimization issue. What if we want to apply the same curve to all channels of an image?
Splitting and remerging channels is wasteful, in this case, because we do not need to distinguish between channels. We just need one- dimensional indexing, as used by applyLookupArray. The approach in createFlatView works for images with any number of channels. Thus, it allows us to abstract the difference between grayscale and color images in cases when we wish to treat all channels the same. Thus, they need to be classes, not just functions.
The function is applied to the V value channel of a grayscale image or to all channels of a color image. Instead of being instantiated with a function, it is instantiated with a set of control points, which it uses internally to create a curve function.
One of the functions is applied to all channels and the other three functions are each applied to a single channel. The overall function is applied irst and then the per-channel functions.
Instead of being instantiated with four functions, it is instantiated with four sets of control points, which it uses internally to create curve functions.
Additionally, all these classes accept a constructor argument that is a numeric type, such as numpy. This type is used to determine how many entries should be in the lookup array.
Let's irst look at the implementations of VFuncFilter and VcurveFilter, which may both be added to filters.
We are also using numpy. We are also using iinfo , split , and merge. These four classes can be used as is, with custom functions or control points being passed as arguments at instantiation. Alternatively, we can make further subclasses that hard-code certain functions or control points. Such subclasses could be instantiated without any arguments. Emulating photo ilms A common use of curves is to emulate the palettes that were common in pre-digital photography.
Every type of photo ilm has its own, unique rendition of color or grays but we can generalize about some of the differences from digital sensors. Film tends to suffer loss of detail and saturation in shadows, whereas digital tends to suffer these failings in highlights.
Also, ilm tends to have uneven saturation across different parts of the spectrum. So each ilm has certain colors that pop or jump out. Thus, when we think of good-looking ilm photos, we may think of scenes or renditions that are bright and that have certain dominant colors.
At the other extreme, we may remember the murky look of underexposed ilm that could not be improved much by the efforts of the lab technician. We are going to create four different ilm-like ilters using curves. We just override the constructor to specify a set of control points for each channel. The choice of control points is based on recommendations by photographer Petteri Sulonen.
The Portra, Provia, and Velvia effects should produce normal-looking images. The effect should not be obvious except in before-and-after comparisons.
As a portrait ilm, it tends to make people's complexions fairer. Also, it exaggerates certain common clothing colors, such as milky white for example, a wedding dress and dark blue for example, a suit or jeans.
Let's add this implementation of a Portra ilter to filters. Sky, water, and shade are enhanced more than sun. Let's add this implementation of a Provia ilter to filters. It can often produce azure skies in daytime and crimson clouds at sunset. The effect is dificult to emulate but here is an attempt that we can add to filters. Black and white are not necessarily preserved. Also, contrast is very high. Cross-processed photos take on a sickly appearance. People look jaundiced, while inanimate objects look stained.
Let's edit filters. We, as humans, can easily recognize many object types and their pose just by seeing a backlit silhouette or a rough sketch. Indeed, when art emphasizes edges and pose, it often seems to convey the idea of an archetype, like Rodin's The Thinker or Joe Shuster's Superman. Software, too, can reason about edges, poses, and archetypes.
We will discuss these kinds of reasoning in later chapters. For the moment, we are interested in a simple use of edges for artistic effect. We are going to trace an image's edges with bold, black lines. The effect should be reminiscent of a comic book or other illustration, drawn with a felt pen.
These ilters are supposed to turn non-edge regions to black while turning edge regions to white or saturated colors. However, they are prone to misidentifying noise as edges. This law can be mitigated by blurring an image before trying to ind its edges.
OpenCV also provides many blurring ilters, including blur simple average , medianBlur , and GaussianBlur. The arguments to the edge-inding and blurring ilters vary but always include ksize, an odd whole number that represents the width and height in pixels of the ilter's kernel.
A kernel is a set of weights that are applied to a region in the source image to generate a single pixel in the destination image. For example, a ksize of 7 implies that 49 7 x 7 source pixels are considered in generating each destination pixel.
We can think of a kernel as a piece of frosted glass moving over the source image and letting through a diffused blend of the source's light. For blurring, let's use medianBlur , which is effective in removing digital video noise, especially in color images. For edge-inding, let's use Laplacian , which produces bold edge lines, especially in grayscale images.
Once we have the result of Laplacian , we can invert it to get black edges on a white background. Then, we can normalize it so that its values range from 0 to 1 and multiply it with the source image to darken the edges. Let's implement this approach in filters. Laplacian graySrc, cv2. The blurKsize argument is used as ksize for medianBlur , while edgeKsize is used as ksize for Laplacian. With my webcams, I ind that a blurKsize value of 7 and edgeKsize value of 5 look best.
Unfortunately, medianBlur is expensive with a large ksize like 7. If you encounter performance problems when running strokeEdges , try decreasing the blurKsize value. To turn off blur, set it to a value less than 3. Custom kernels — getting convoluted As we have just seen, many of OpenCV's predeined ilters use a kernel.
Remember that a kernel is a set of weights, which determine how each output pixel is calculated from a neighborhood of input pixels. Another term for a kernel is a convolution matrix.
It mixes up or convolutes the pixels in a region. Similarly, a kernel-based ilter may be called a convolution ilter. OpenCV provides a very versatile function, filter2D , which applies any kernel or convolution matrix that we specify. To understand how to use this function, let's irst learn the format of a convolution matrix. It is a 2D array with an odd number of rows and columns. The central element corresponds to a pixel of interest and the other elements correspond to that pixel's neighbors.
Each element contains an integer or loating point value, which is a weight that gets applied to an input pixel's value. For the pixel of interest, the output color will be nine times its input color, minus the input colors of all eight adjacent pixels.
If the pixel of interest was already a bit different from its neighbors, this difference becomes intensiied. The effect is that the image looks sharper as the contrast between neighbors is increased. A negative value as used here means that the destination image has the same depth as the source image. For color images, note that filter2D applies the kernel equally to each channel.
To use different kernels on different channels, we would also have to use the split and merge functions, as we did in our earlier channel mixing functions. See the section Simulating RC color space. Based on this simple example, let's add two classes to filters. One class, VConvolutionFilter, will represent a convolution ilter in general. A subclass, SharpenFilter, will represent our sharpening ilter speciically. See the section Designing object-oriented curve ilters.
This should be the case whenever we want to leave the image's overall brightness unchanged. If we modify a sharpening kernel slightly, so that its weights sum to 0 instead, then we have an edge detection kernel that turns edges white and non-edges black. For example, let's add the following edge detection ilter to filters. Generally, for a blur effect, the weights should sum to 1 and should be positive throughout the neighborhood.
For example, we can take a simple average of the neighborhood, as follows: class BlurFilter VConvolutionFilter : """A blur filter with a 2-pixel radius. Sometimes, though, kernels with less symmetry produce an interesting effect. Let's consider a kernel that blurs on one side with positive weights and sharpens on the other with negative weights.
It will produce a ridged or embossed effect. Indeed, it is more basic than OpenCV's ready-made set of ilters. However, with a bit of experimentation, you should be able to write your own kernels that produce a unique look. Modifying the application Now that we have high-level functions and classes for several ilters, it is trivial to apply any of them to the captured frames in Cameo.
Let's edit cameo. The rest is the same as in Chapter 2. Here, I have chosen to apply two effects: stroking the edges and emulating Portra ilm colors. Feel free to modify the code to apply any ilters you like. We should also have several more ilter implementations that are easily swappable with the ones we are currently using. Now, we are ready to proceed with analyzing each frame for the sake of inding faces to manipulate in the next chapter. Speciically, we look at Haar cascade classiiers, which analyze contrast between adjacent image regions to determine whether or not a given image or subimage matches a known type.
We consider how to combine multiple Haar cascade classiiers in a hierarchy, such that one classiier identiies a parent region for our purposes, a face and other classiiers identify child regions eyes, nose, and mouth. We also take a detour into the humble but important subject of rectangles. By drawing, copying, and resizing rectangular image regions, we can perform simple manipulations on image regions that we are tracking.
By the end of this chapter, we will integrate face tracking and rectangle manipulations into Cameo. Finally, we'll have some face-to-face interaction! Tracking Faces with Haar Cascades Conceptualizing Haar cascades When we talk about classifying objects and tracking their location, what exactly are we hoping to pinpoint?
What constitutes a recognizable part of an object? Photographic images, even from a webcam, may contain a lot of detail for our human viewing pleasure.
However, image detail tends to be unstable with respect to variations in lighting, viewing angle, viewing distance, camera shake, and digital noise.
Moreover, even real differences in physical detail might not interest us for the purpose of classiication. I was taught in school, that no two snowlakes look alike under a microscope. Fortunately, as a Canadian child, I had already learned how to recognize snowlakes without a microscope, as the similarities are more obvious in bulk. Thus, some means of abstracting image detail is useful in producing stable classiication and tracking results.
The abstractions are called features, which are said to be extracted from the image data. There should be far fewer features than pixels, though any pixel might inluence multiple features. The level of similarity between two images can be evaluated based on distances between the images' corresponding features.
For example, distance might be deined in terms of spatial coordinates or color coordinates. Haar-like features are one type of feature that is often applied to real-time face tracking.
They were irst used for this purpose by Paul Viola and Michael Jones in Current wait time will be sent to you in the confirmation email. Thank you! First Name. Email Address.
0コメント