The mature workhorse of object identification using image processing is the connected components algorithm. The algorithm was described in the venerable but still useful two-volume classic from 1982, Digital Picture Processing by Kak and Rosenfeld, and in the decades since by numerous books, presentations, and web pages about image processing.
Last week I had a pleasant Zoom call with a UX researcher who found my Medium post about prototyping accessible apps with Protopie and wanted to know more about my experience. Protopie, unlike virtually all other wireframing and prototyping tools, provides access to a mobile device’s internal sensors, text-to-speech engine, and haptics, making it possible to design and test an app for people who are blind or have low vision.
But once you’ve designed and implemented your accessible app, how do you test it?
For a small company or a lone developer, it’s a burden and a risk to meet…
Generating a comma-separated string representation for a collection is a common programming task. It’s somewhat less common to generate a comma-separated string that includes the word “and” so that a collection can be read like a list in a normal English sentence: rather than “a, b, c, d” we might sometimes want to generate “a, b, c, and d.”
A common technique to create a comma-separated string is to append string representations of each element in a collection. If the string representation has a length greater than zero, we append a comma before appending the next element.
// non-Swift pseudocode
Finding the histogram of a UIImage can be relatively straightforward: extract the raw data from the UIImage, populate your histogram bins by iterating over every 4-byte representation of a pixel, and then reheat your tea in the microwave while you wait for the histogram to be generated.
The straightforward method is slow.
The fast way to generate a histogram is not straightforward.
Sure, once you copy & paste all the magic function calls from some web page into your project the code is understandable enough. …
Recently I wanted to run some offline image processing tests in Mac OS rather than in iOS. That meant working with NSImage from AppKit rather than UIImage from UIKit. Image processing was implemented as a custom CIFilter from the Core Image framework.
The Mac OS test app does the following:
Here’s the plain text code implemented in Swift 5 (XCode 12) as an…
Part 0 of this series listed some of the difficulties working with coordinate frames and transforms in iOS. Part 1 introduced the L triangle, a simple technique to document and calculate the transforms between any two 2D coordinate frames on an iOS device.
In this post I’ll describe a method to automate the process of finding coordinate frames when your app relies on image processing. Or at least we’ll semi-automate the process, according to your taste.
In the zeroth post in this series I described problems working with multiple coordinate systems in iOS frameworks such as UIKit, Core Graphics, AVFoundation, Vision, and Core ML. Briefly put, it’s a burden to keep track of multiple coordinate systems.
In this post I’ll describe the “L Triangle” technique of defining coordinate frames and calculating transforms between frames. If you’re developing an iOS app for optical character recognition (OCR), 2D barcode reading, computer vision, machine learning, or other applications with multiple image coordinate systems, you may find this technique useful.
If you’re already familiar with matrix math then you’ll see…
There should be a confessional group for people struggling with coordinate transforms in iOS.
“Hi, my name is Gary, and I get flustered figuring out coordinate transforms in iOS.”
“Hi, Gary!” the crowd replies. Some smile. Some grimace, reminded of their own struggles.
Maybe we’re a small crowd, but you know there are programmers out there trying to find info about image coordinate systems on developer.apple.com, digging through StackOverflow posts, and looking for a straightforward answer to the question “Which way is up?”
Where is that one clear diagram that explains all the coordinate systems for my project? Am I…
The post from yesterday showed how to calculate a perspective transform (a homography) from one quadrilateral to another quadrilateral in Swift. Custom types defined a few matrices and a minimal set of matrix operations.
Digging into the noodly guts of the math and writing code from scratch can help reinforce one’s understanding. However, as a professor of a friend of mine once said, we must make certain concessions to the brevity of human life. I for one don’t feel the urge to write a matrix library.
For this post I’ve rewritten yesterday’s perspective transform code to use Apple’s SIMD framework…
(For code that relies on SIMD for matrix operations and implements other improvements, see https://rethunk.medium.com/perspective-transform-from-quadrilateral-to-quadrilateral-in-swift-using-simd-for-matrix-operations-15dc3f090860)
My previous two posts provided Swift code to find the affine transform in 2D space from one triangle to another. One post presented the traditional method of finding the affine transform, and the other post introduced Simplex Affine Mapping. In this post I’ll provide Swift code to find the perspective transform from one quadrilateral to another based on a paper by Dave Eberly.
In 2D image processing, the affine transform is useful when a camera is pointed perpendicular to a flat surface. If the camera…