Skip to content

Commit b4e1df2

Browse files
committed
6 new py scripts and pipeline section
1 parent a9280d9 commit b4e1df2

10 files changed

+588
-22
lines changed

README.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -79,16 +79,22 @@ A math-first approach to learning computer vision in Python. The repository will
7979
- [Contour Properties](digitrecognition/digitrec.html#contour-properties)
8080
- [References and learn-by-building modules](digitrecognition/digitrec.html#references)
8181

82+
### Chapter 5
83+
- Facial Recognition
84+
8285
## Approach and Motivation
8386
The course is foundational to anyone who wish to work with computer vision in Python. It covers some of the most common image processing routines, and have in-depth coverage on mathematical concepts present in the materials:
8487
- Math-first approach
8588
- Tons of sample python scripts (.py)
89+
- 45+ python scripts from chapter 1 to 4 for plug-and-play experiments
8690
- Multimedia (image illustrations, video explanation, quiz)
91+
- 57 image assets from chapter 1 to 4 for practical illustrations
92+
- 4 PDFs, and 4 HTMLs, one for each chapter
8793
- Practical tips on real-world applications
8894

8995
The course's **only dependency** is `OpenCV`. Getting started is as easy as `pip install opencv-contrib-python` and you're set to go.
9096

91-
- Question: What about deep learning libraries?
97+
##### Question: What about deep learning libraries?
9298

9399
No; While using deep learning for images made for interesting topics, they are probably better suited as an altogether separate course series. This course series (tutorial series) focused on the **essentials of computer vision** and,
94100
for pedagogical reasons, try not to be overly ambitious with the scope it intends to cover.

digitrecognition/contourarea_03.py

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import cv2
2+
3+
PURPLE = (75, 0, 130)
4+
YELLOW = (0, 255, 255)
5+
THICKNESS = 4
6+
FONT = cv2.FONT_HERSHEY_SIMPLEX
7+
8+
img_color = cv2.imread("assets/ocbc.jpg")
9+
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
10+
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)
11+
12+
blurred = cv2.GaussianBlur(img, (7, 7), 0)
13+
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
14+
edged = cv2.Canny(blurred, 130, 150, 255)
15+
16+
cv2.imshow("Outline of device", edged)
17+
cv2.waitKey(0)
18+
19+
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
20+
# sort contours by area, and get the first 10
21+
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]
22+
23+
cv2.drawContours(img_color, cnts, 0, PURPLE, THICKNESS)
24+
cv2.imshow("Target Contour", img_color)
25+
cv2.waitKey(0)
26+
27+
for i in range(len(cnts)):
28+
cv2.drawContours(img_color, cnts, i, PURPLE, THICKNESS)
29+
print(f"ContourArea:{cv2.contourArea(cnts[i])}")
30+
x, y, w, h = cv2.boundingRect(cnts[i])
31+
cv2.rectangle(img_color, (x, y), (x + w, y + h), YELLOW, THICKNESS)
32+
33+
area = round(cv2.contourArea(cnts[i]), 1)
34+
peri = round(cv2.arcLength(cnts[i], closed=True), 1)
35+
print(f"ContourArea:{area}, Peri: {peri}")
36+
cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE, 1)
37+
cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4, PURPLE, 1)
38+
39+
cv2.imshow("Contour one by one", img_color)
40+
cv2.waitKey(0)
41+

digitrecognition/digit_01.py

+149
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
import cv2
2+
import numpy as np
3+
4+
FONT = cv2.FONT_HERSHEY_SIMPLEX
5+
CYAN = (255, 255, 0)
6+
DIGITSDICT = {
7+
(1, 1, 1, 1, 1, 1, 0): 0,
8+
(0, 1, 1, 0, 0, 0, 0): 1,
9+
(1, 1, 0, 1, 1, 0, 1): 2,
10+
(1, 1, 1, 1, 0, 0, 1): 3,
11+
(0, 1, 1, 0, 0, 1, 1): 4,
12+
(1, 0, 1, 1, 0, 1, 1): 5,
13+
(1, 0, 1, 1, 1, 1, 1): 6,
14+
(1, 1, 1, 0, 0, 1, 0): 7,
15+
(1, 1, 1, 1, 1, 1, 1): 8,
16+
(1, 1, 1, 1, 0, 1, 1): 9,
17+
}
18+
19+
20+
# roi_color = cv2.imread("inter/dbs-roi.png")
21+
roi_color = cv2.imread("inter/ocbc-roi.png")
22+
roi = cv2.cvtColor(roi_color, cv2.COLOR_BGR2GRAY)
23+
24+
RATIO = roi.shape[0] * 0.2
25+
26+
roi = cv2.bilateralFilter(roi, 5, 30, 60)
27+
28+
trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
29+
roi_color = roi_color[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
30+
cv2.imshow("Blurred and Trimmed", trimmed)
31+
cv2.waitKey(0)
32+
33+
edged = cv2.adaptiveThreshold(
34+
trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
35+
)
36+
cv2.imshow("Edged", edged)
37+
cv2.waitKey(0)
38+
39+
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
40+
dilated = cv2.dilate(edged, kernel, iterations=1)
41+
42+
cv2.imshow("Dilated", dilated)
43+
cv2.waitKey(0)
44+
45+
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
46+
dilated = cv2.dilate(dilated, kernel, iterations=1)
47+
48+
cv2.imshow("Dilated x2", dilated)
49+
cv2.waitKey(0)
50+
51+
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2, 1),)
52+
eroded = cv2.erode(dilated, kernel, iterations=1)
53+
54+
cv2.imshow("Eroded", eroded)
55+
cv2.waitKey(0)
56+
57+
h = roi.shape[0]
58+
ratio = int(h * 0.07)
59+
eroded[-ratio:,] = 0
60+
eroded[:, :ratio] = 0
61+
62+
cv2.imshow("Eroded + Black", eroded)
63+
cv2.waitKey(0)
64+
65+
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
66+
digits_cnts = []
67+
68+
canvas = trimmed.copy()
69+
cv2.drawContours(canvas, cnts, -1, (255, 255, 255), 1)
70+
cv2.imshow("All Contours", canvas)
71+
cv2.waitKey(0)
72+
73+
canvas = trimmed.copy()
74+
for cnt in cnts:
75+
(x, y, w, h) = cv2.boundingRect(cnt)
76+
if h > 20:
77+
digits_cnts += [cnt]
78+
cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
79+
cv2.drawContours(canvas, cnt, 0, (255, 255, 255), 1)
80+
cv2.imshow("Digit Contours", canvas)
81+
cv2.waitKey(0)
82+
83+
print(f"No. of Digit Contours: {len(digits_cnts)}")
84+
85+
86+
cv2.imshow("Digit Contours", canvas)
87+
cv2.waitKey(0)
88+
89+
90+
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])
91+
92+
canvas = trimmed.copy()
93+
94+
95+
for i, cnt in enumerate(sorted_digits):
96+
(x, y, w, h) = cv2.boundingRect(cnt)
97+
cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
98+
cv2.putText(canvas, str(i), (x, y - 3), FONT, 0.3, (0, 0, 0), 1)
99+
100+
cv2.imshow("All Contours sorted", canvas)
101+
cv2.waitKey(0)
102+
103+
digits = []
104+
canvas = roi_color.copy()
105+
for cnt in sorted_digits:
106+
(x, y, w, h) = cv2.boundingRect(cnt)
107+
roi = eroded[y : y + h, x : x + w]
108+
print(f"W:{w}, H:{h}")
109+
# convenience units
110+
qW, qH = int(w * 0.25), int(h * 0.15)
111+
fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)
112+
113+
# seven segments in the order of wikipedia's illustration
114+
sevensegs = [
115+
((0, 0), (w, qH)), # a (top bar)
116+
((w - qW, 0), (w, halfH)), # b (upper right)
117+
((w - qW, halfH), (w, h)), # c (lower right)
118+
((0, h - qH), (w, h)), # d (lower bar)
119+
((0, halfH), (qW, h)), # e (lower left)
120+
((0, 0), (qW, halfH)), # f (upper left)
121+
# ((0, halfH - fractionH), (w, halfH + fractionH)) # center
122+
(
123+
(0 + fractionW, halfH - fractionH),
124+
(w - fractionW, halfH + fractionH),
125+
), # center
126+
]
127+
128+
# initialize to off
129+
on = [0] * 7
130+
131+
for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
132+
region = roi[p1y:p2y, p1x:p2x]
133+
print(
134+
f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
135+
)
136+
if np.sum(region == 255) > region.size * 0.5:
137+
on[i] = 1
138+
print(f"State of ON: {on}")
139+
140+
digit = DIGITSDICT[tuple(on)]
141+
print(f"Digit is: {digit}")
142+
digits += [digit]
143+
cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
144+
cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
145+
cv2.imshow("Digit", canvas)
146+
cv2.waitKey(0)
147+
148+
print(f"Digits on the token are: {digits}")
149+

digitrecognition/digitrec.html

+137-20
Large diffs are not rendered by default.

digitrecognition/digitrec.md

+135-1
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,8 @@ Because of how these operations work, there are a couple of things to note:
193193

194194
![](assets/morphexample.png)
195195

196+
The full code solution is in `morphological_02.py`.
197+
196198
As we read our image in grayscale mode (`flags=0`), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled "Original" above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image.
197199

198200
An erosion operation is then performed (step 3). This works by creating our kernel (either through `numpy` or through `opencv`'s structuring element) and sliding that kernel across our image to remove white noises in our image.
@@ -217,6 +219,13 @@ cv2.imshow("Transformed", dilated)
217219
cv2.waitKey(0)
218220
```
219221

222+
OpenCV provides the three shapes for our kernel:
223+
- Rectangular box: `MORPH_RECT`
224+
- Cross: `MORPH_CROSS`
225+
- Ellipse: `MORPH_ELLIPSE`
226+
227+
They are fed as the first argument into `cv2.getStructuringElement()`, with the second being the kernel size (`ksize`) itself. The third argument is the _anchor point_, which defaults to the center.
228+
220229
### Opening and Closing
221230
Another name for **Erosion, followed by Dilation** is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first **perform Dilation followed by Erosion**, particularly suited for closing small holes inside foreground objects.
222231

@@ -332,6 +341,22 @@ If you are paying close attention to the digit '0' in our LCD display, you will
332341

333342
A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you've learned earlier.
334343

344+
Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand.
345+
346+
As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:
347+
348+
```py
349+
roi = cv2.imread("roi.png", flags=0)
350+
RATIO = roi.shape[0] * 0.2
351+
trimmed = roi[
352+
int(RATIO) :,
353+
int(RATIO) : roi.shape[1] - int(RATIO)]
354+
```
355+
356+
That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution **only** working on those specific images and not others, a phenomenon fondly termed as "overfitting" in the machine learning community.
357+
358+
I've re-executed the solution code against some sample image sets, once with the "trimming" in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more "blank space" at the top half compared to the bottom half.
359+
335360
#### Contour Properties
336361
Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits "40710382", our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as "1740238".
337362

@@ -365,9 +390,118 @@ for cnt in cnts:
365390
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])
366391
```
367392

368-
When we put these together, we now have a complete pipeline.
393+
When we put these together, we now have a complete pipeline:
369394
![](assets/digitrecflow.png)
370395

396+
The full solution code is in `digit_01.py` but the essential parts are as follow:
397+
398+
```py
399+
import cv2
400+
import numpy as np
401+
# step 1:
402+
DIGITSDICT = {
403+
(1, 1, 1, 1, 1, 1, 0): 0,
404+
(0, 1, 1, 0, 0, 0, 0): 1,
405+
(1, 1, 0, 1, 1, 0, 1): 2,
406+
(1, 1, 1, 1, 0, 0, 1): 3,
407+
(0, 1, 1, 0, 0, 1, 1): 4,
408+
(1, 0, 1, 1, 0, 1, 1): 5,
409+
(1, 0, 1, 1, 1, 1, 1): 6,
410+
(1, 1, 1, 0, 0, 1, 0): 7,
411+
(1, 1, 1, 1, 1, 1, 1): 8,
412+
(1, 1, 1, 1, 0, 1, 1): 9,
413+
}
414+
415+
# step 2
416+
roi = cv2.imread("inter/ocbc-roi.png", flags=0)
417+
418+
# step 3
419+
RATIO = roi.shape[0] * 0.2
420+
roi = cv2.bilateralFilter(roi, 5, 30, 60)
421+
trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
422+
423+
# step 4
424+
edged = cv2.adaptiveThreshold(
425+
trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
426+
)
427+
428+
# step 5
429+
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
430+
dilated = cv2.dilate(edged, kernel, iterations=1)
431+
eroded = cv2.erode(dilated, kernel, iterations=1)
432+
433+
# step 6
434+
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
435+
digits_cnts = []
436+
for cnt in cnts:
437+
(x, y, w, h) = cv2.boundingRect(cnt)
438+
if h > 20:
439+
digits_cnts += [cnt]
440+
441+
# step 7
442+
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])
443+
444+
# step 8
445+
digits = []
446+
for cnt in sorted_digits:
447+
# step 8a
448+
(x, y, w, h) = cv2.boundingRect(cnt)
449+
roi = eroded[y : y + h, x : x + w]
450+
qW, qH = int(w * 0.25), int(h * 0.15)
451+
fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)
452+
453+
# step 8b
454+
sevensegs = [
455+
((0, 0), (w, qH)), # a (top bar)
456+
((w - qW, 0), (w, halfH)), # b (upper right)
457+
((w - qW, halfH), (w, h)), # c (lower right)
458+
((0, h - qH), (w, h)), # d (lower bar)
459+
((0, halfH), (qW, h)), # e (lower left)
460+
((0, 0), (qW, halfH)), # f (upper left)
461+
# ((0, halfH - fractionH), (w, halfH + fractionH)) # center
462+
(
463+
(0 + fractionW, halfH - fractionH),
464+
(w - fractionW, halfH + fractionH),
465+
), # center
466+
]
467+
468+
# step 8c
469+
on = [0] * 7
470+
for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
471+
region = roi[p1y:p2y, p1x:p2x]
472+
print(
473+
f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
474+
)
475+
if np.sum(region == 255) > region.size * 0.5:
476+
on[i] = 1
477+
print(f"State of ON: {on}")
478+
# step 8d
479+
digit = DIGITSDICT[tuple(on)]
480+
print(f"Digit is: {digit}")
481+
digits += [digit]
482+
# step 9
483+
cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
484+
cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
485+
cv2.imshow("Digit", canvas)
486+
cv2.waitKey(0)
487+
print(f"Digits on the token are: {digits}")
488+
```
489+
490+
- Step 1: Initialize the lookup dictionary
491+
- Step 2: Read our ROI image using OpenCV
492+
- Step 3: Noise reduction and trim away asymmetrical white space in our ROI
493+
- Step 4: Binarize our image using adaptive thresholding
494+
- Step 5: Morphological transformation to remove noise and fill the small holes in our digit
495+
- Step 6: Find contours in our image with a height greater than 20px
496+
- Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)
497+
- Step 8
498+
- Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (`h`) of our rectangular box
499+
- Step 8b: Slice the seven segments; The first segment ("A") is from point (0,0) to (w, `int(h * 0.15)`); This segment is `w` in width and 15% the height of the full digit contour, starting from position (0, 0)
500+
- Step 8c: Initialize the state to `0` for each of the 7 segments, then conditionally set regions with more white than black pixels to `1`
501+
- Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the `digits` list created at the beginning of step 8
502+
- Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the `digits` list.
503+
504+
371505
# References
372506
[^1]: LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324
373507
[^2]: Saliency map, Wikipedia

digitrecognition/digitrec.pdf

29.1 KB
Binary file not shown.

0 commit comments

Comments
 (0)