You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[References and learn-by-building modules](digitrecognition/digitrec.html#references)
81
81
82
+
### Chapter 5
83
+
- Facial Recognition
84
+
82
85
## Approach and Motivation
83
86
The course is foundational to anyone who wish to work with computer vision in Python. It covers some of the most common image processing routines, and have in-depth coverage on mathematical concepts present in the materials:
84
87
- Math-first approach
85
88
- Tons of sample python scripts (.py)
89
+
- 45+ python scripts from chapter 1 to 4 for plug-and-play experiments
86
90
- Multimedia (image illustrations, video explanation, quiz)
91
+
- 57 image assets from chapter 1 to 4 for practical illustrations
92
+
- 4 PDFs, and 4 HTMLs, one for each chapter
87
93
- Practical tips on real-world applications
88
94
89
95
The course's **only dependency** is `OpenCV`. Getting started is as easy as `pip install opencv-contrib-python` and you're set to go.
90
96
91
-
- Question: What about deep learning libraries?
97
+
#####Question: What about deep learning libraries?
92
98
93
99
No; While using deep learning for images made for interesting topics, they are probably better suited as an altogether separate course series. This course series (tutorial series) focused on the **essentials of computer vision** and,
94
100
for pedagogical reasons, try not to be overly ambitious with the scope it intends to cover.
Copy file name to clipboardExpand all lines: digitrecognition/digitrec.md
+135-1
Original file line number
Diff line number
Diff line change
@@ -193,6 +193,8 @@ Because of how these operations work, there are a couple of things to note:
193
193
194
194

195
195
196
+
The full code solution is in `morphological_02.py`.
197
+
196
198
As we read our image in grayscale mode (`flags=0`), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled "Original" above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image.
197
199
198
200
An erosion operation is then performed (step 3). This works by creating our kernel (either through `numpy` or through `opencv`'s structuring element) and sliding that kernel across our image to remove white noises in our image.
They are fed as the first argument into `cv2.getStructuringElement()`, with the second being the kernel size (`ksize`) itself. The third argument is the _anchor point_, which defaults to the center.
228
+
220
229
### Opening and Closing
221
230
Another name for **Erosion, followed by Dilation** is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first **perform Dilation followed by Erosion**, particularly suited for closing small holes inside foreground objects.
222
231
@@ -332,6 +341,22 @@ If you are paying close attention to the digit '0' in our LCD display, you will
332
341
333
342
A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you've learned earlier.
334
343
344
+
Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand.
345
+
346
+
As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:
347
+
348
+
```py
349
+
roi = cv2.imread("roi.png", flags=0)
350
+
RATIO= roi.shape[0] *0.2
351
+
trimmed = roi[
352
+
int(RATIO) :,
353
+
int(RATIO) : roi.shape[1] -int(RATIO)]
354
+
```
355
+
356
+
That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution **only** working on those specific images and not others, a phenomenon fondly termed as "overfitting" in the machine learning community.
357
+
358
+
I've re-executed the solution code against some sample image sets, once with the "trimming" in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more "blank space" at the top half compared to the bottom half.
359
+
335
360
#### Contour Properties
336
361
Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits "40710382", our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as "1740238".
- Step 3: Noise reduction and trim away asymmetrical white space in our ROI
493
+
- Step 4: Binarize our image using adaptive thresholding
494
+
- Step 5: Morphological transformation to remove noise and fill the small holes in our digit
495
+
- Step 6: Find contours in our image with a height greater than 20px
496
+
- Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)
497
+
- Step 8
498
+
- Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (`h`) of our rectangular box
499
+
- Step 8b: Slice the seven segments; The first segment ("A") is from point (0,0) to (w, `int(h * 0.15)`); This segment is `w` in width and 15% the height of the full digit contour, starting from position (0, 0)
500
+
- Step 8c: Initialize the state to `0` for each of the 7 segments, then conditionally set regions with more white than black pixels to `1`
501
+
- Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the `digits` list created at the beginning of step 8
502
+
- Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the `digits` list.
503
+
504
+
371
505
# References
372
506
[^1]: LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324
0 commit comments