I’m back! I spent my 2 week hiatus in New York and Boston to attend my brother’s wedding ceremony and I guess take a little break from work. The moment I got back, it was straight to work. I handled quite a difficult task (for me) that was image processing, and I had never done it before. The only little thing I knew was that images are made of pixels and that they could be manipulated. So this post will be all about image processing and a little introduction to computer vision.
Monday: Public Holiday!
Thursday – Sunday: Image Processing
What is computer vision?
There is the Wikipedia definition, but here’s how I see it. Computer vision is a field that concerns with how computers view things like humans do and how they can manipulate the things we see. The most basic example is facial recognition. A computer looks through a set of images and runs through the pixels. There is a certain criteria as to what determines a face and that’s usually by color. By detecting RGB (red blue green) values of an image, the computer is able to determine, whether by a probability model or straight up color prediction, whether there is a face there.
A little more in depth
Usually, an image is represented as a 2D array, one array for the y-axis, and another array for the x-axis, nested in the first array. And each value in the inner array contains the RGB values, represented as a tuple, i.e. (255, 255, 255), and usually in 8-bits. If the image is converted to grayscale, then it is just a single value to represent the “whiteness” or “darkness” of a pixel.
Finding an efficient way to access these pixels are important, because looping through the arrays means lots of iterations. It might be fine for a 800 x 800 image, but for 5MB images that go 5634 x 3687 pixels, that’s a lot of times to go through just one iteration. And that’s just looping through it, let alone making modifications to it. This means that to achieve a reasonable runtime, the algorithms have to be efficient. Imagine for the 5634 x 3687 pixel image, adding one additional step within the loop means performing an additional 9000+ steps. That’s by no means trivial.
And then there’s manipulation
There are a few basic ways to make changes to images, best if they are binary. Then it makes detection so much easier. But what if things are in gradients? My project this week required me to detect and make changes to colors over a spectrum and not just one pixel value. I went to search up a few algorithms that could’ve helped me achieve this, but they all dealt with binary images. For example, how to detect black dots on a color background. That is done by converting the image to a HSV color space and then isolating the black color from the rest. Other ways included using adaptive threshold to input a threshold and then getting the other pixels to adapt to it. Problem is that they are all binaries, and the algorithm sets the colors to either that color or 0 (black). That makes for really unnatural effects, which was not in the job scope of my project.
Write my own algorithm. It’s by no means an easy task, and there’s certainly a lot more sophistication that could go into it. But in layman terms, here’s what it tries to do:
1. Isolate the portion that I want to detect.
2. Crops it out for the computer to work on
3. Uses percentiles to check for the relative brightness of the pixels.
4. Examine the color of each pixel in relation to the percentiles I determined.
5. Makes the necessary changes if it satisfies the criteria I input into it.
There are more clever ways I could go about implementing this, like using block sizes to filter out parts of the image I don’t want to be detected, or using recursive calls to use the previous pixel values detected as the criteria for the changes I need to make. But I had a time limit, and currently, my skills with Python aren’t advanced enough to do it. At the moment, I had been trained so much in OCaml, that thinking in Object Oriented Programming is more difficult for me.
In computer science class, I learnt that anything that runs at O(n^2) is really bad. Like insertion sort is O(n^2), and you could very well do with better sorting algorithms. The algorithm I tried was O(n^2) as it used one for loop to go through the image, to grab the relative brightness, and then another loop to make the necessary changes. However, there was a dependency of one loop on the other, meaning that for the second loop to run, I needed the first loop to run first. The binding of these processes meant that I had little choice but to run two of them one after the other.
Another thing, most processes run on the computer are fast. They do the job pretty well. However, when they’re transferred onto a microcomputer, things can go pretty awry. So for our project, we had to do it on a Raspberry Pi 2, and before that we had to install a few libraries, mainly numpy. Numpy took 1 hour + to compile on the Pi, and we kind of regretted not compiling it first. Also, cause on a computer we can usually override the “Permission denied” errors easily just by doing sudo once, on the Pi you have to ensure it is being done consistently.
My algorithm ran 5-7s on a computer, but ran 20s on a Pi. For our project, that was quite disastrous. And now, I’m trying out other algorithms, but they’re not doing any better at all. I’ll update it later this week.
When it comes to real-life work, runtime matters a great deal.
A little musing on OOP (Object Oriented Programming)
1. No error catching at compile time.
This means that when I run a function, it goes through the compiler without showing any errors. For me, cause I don’t have the habit of printing debug statements throughout my code, I am unsure of where the source of the error is. The system may throw an error saying that there was a wrong type here, but that also means reviewing lines of code all over again to identify what’s actually causing the error. It takes a lot of time to do so, and having the habit of testing your code after small snippets really helps. Also, when code doesn’t run, they don’t tell me either, so sometimes I’m left wondering what went wrong.
That being said, I have enough experience now to know how to set up my own code to find bugs and test properly.
2. Functions are not first-class objects
This means that I can’t just pass functions around in other functions easily. It can get quite tedious to write code after awhile, and I really appreciate functional programming languages for making it so effortless.
This week was a good introduction to computer vision though, even though I probably only scraped the surface. More importantly, I’m learning how to write good code on my own, and understanding how to find errors in them in a new context.