Coordinate Transforms in iOS using Swift, Part 2: Automatically Finding the Coordinate Frame

5 min readFeb 8, 2021

Part 0 of this series listed some of the difficulties working with coordinate frames and transforms in iOS. Part 1 introduced the L triangle, a simple technique to document and calculate the transforms between any two 2D coordinate frames on an iOS device.

In this post I’ll describe a method to automate the process of finding coordinate frames when your app relies on image processing. Or at least we’ll semi-automate the process, according to your taste.

If you’ve worked with Core Graphics, Vision, or other frameworks with coordinate systems that differ from the UIKit coordinate system, then you’ve likely worked with CGAffineTransform. If CGAffineTransform satisfies your needs, then good on you! But if you encountered problems finding transforms, or if you wrote more code than you’d like to handle coordinate frames and transforms, read on.

Determining the origin and axis directions of a coordinate frame can be made simpler by using a printed target to test all your coordinate frames at once. A printed target is also useful for regression testing.

Example: QR Codes, OCR, and Rectangle Detection
Let’s say you have an app that combines QR Codes (2D barcodes), OCR (optical character recognition), and rectangle detection. If we draw graphics for vision results using Core Graphics, then that adds another coordinate frame, which brings our total to four different coordinate frames — and that’s without including UIView coordinate frames.

Although we could test each frame independently by first reading some QR Codes with our app, then reading some text for OCR, etc., we can make the process simple and reproducible by pointing the device’s camera at a printed paper target.

The Test Target

A printed target should satisfy a few requirements:

Align the barcodes, text, and shapes with the horizontal and vertical edges of the paper.
Print QR Codes and matching text next to each other.
For OCR, use entire words rather than individual letters.
For rectangle detection, print a thick border with sharp corners.
Make the QR codes about 1 inch (2.54 cm) on a side.
For OCR, select a larger font. I used Calibri Bold, 22 points.
Design the target to find coordinates — not to test the accuracy of vision results.

The image below is a screenshot of a target made in Word. The target has QR Codes encoded with “a,” “b,” “c,” and “d” next to the words “Ant,” “Bee,” “Can,” and “Dog.” The QR codes and texts are ordered counterclockwise from the top left so that A, B, C form an L shape. The order makes it easier to define a coordinate frame using the L Triangle.

A test target of four QR codes surrounding a thick-lined square. Inside of the square are four texts: Ant, Bee, Can, and Dog. — Test target with QR codes a, b, c, d; a thick-bordered rectangle suitable for rectangle detection; and four three-letter words suitable for OCR (optical character recognition).

QR Code coordinates

If you’re reading QR Codes from live images, then you’ll retrieve QR Code (x,y) positions from your override of the AVCaptureMetadataOutputObjectsDelegate function metadataOutput.

extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
    /// Data stream for QR Codes and/or DataMatrix codes. 
    func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection) {
               let objects = metadataObjects.compactMap( { $0 as? AVMetadataMachineReadableCodeObject } )for obj in objects {
            if let s = obj.stringValue {
                //check for texts starting with "A," "B,"...
                //format according to personal taste
                if let c = obj.centroid() {
                    print("\(s) @ (\(c.x), \(c.y))")
                }
            }
        }
    }
}

(Code here is provided as accessible plain text rather than as pretty screenshots.)

The same technique would apply to Data Matrix codes. Data Matrix barcodes are a standard for uniquely identifying parts in assembly line processes that produce computer chips, electronic components, car parts, shampoo bottles, and the like.

OCR result coordinates

For your instance of VNRecognizeTextRequest, set the property usesLanguageCorrection to true to increase the accuracy of reading text. This setting complements the use of whole words rather than just individual letters for OCR. We want to give the OCR engine every chance to succeed.

var request = VNRecognizeTextRequest { ...   //...
   request.recognitionLevel = .accurate
   request.recognitionLanguages = ["en-US", "en-GB"]
   request.usesLanguageCorrection = true
   request.minimumTextHeight = 0.005 //pick some minimum
}

Once in a while the OCR engine may read a QR code as text. The function metadataOutput(output:from:) function will provide the correct data for the QR codes, but independently the OCR engine may misinterpret a QR Code as text. To avoid misinterpretation you can reject any OCR results with bounding boxes that overlap QR Code results, although your ability to detect that overlap depends on knowing the frames in advance. Ha ha!

var request = VNRecognizeTextRequest {observation in observations {
      guard let topCandidate = observation.topCandidates(1).first else {
          return
      }
      
      ocrLoop: for observation in observations {
          guard let topCandidate = observation.topCandidates(1).first else {
              return
          }
          
          // exclude OCR text that falls inside a QR code (
          if checkObjectOverlap {
              overlapCheck: for sample in self.qrCodes {
                  let q = sample.value.bounds
                  if q.intersects(observation.boundingBox) {
                      if let s = sample.value.text {
                          print("QR code '\(s)' found as text")
                      }
                      else {
                          print("QR code overlap")
                      }
                      
                      continue ocrLoop
                  }
              }
          }let b = observation.boundingBox//...
      }
  }}

In this is too much trouble, you could print the QR Codes elsewhere on the page, temporarily cover up the QR Codes, or even print a separate target.

Rectangle Detection

Rectangle detection is commonly used to correct perspective distortion. Document scanning apps typically find a bounding quadrilateral for a sheet of paper, prompt you to accept or correct the detected corners, and then “unwarp” the quadrilateral to a rectangle. In an earlier post I presented code in Swift to perform perspective correction.

For rectangle detection in Swift, check out the ExploringRectangleDetection project on GitHub after reading the February 2020 blog post “Detecting Rectangles In Images Using Apple’s Vision Framework” by Jonathan Badger.

Print to Console, Create an ImageFrame

And that’s about it: take a snapshot of a printed target, print the (x,y) coordinates from detected objects to the console, and then figure out which way +x and +y are pointing.

For real-world examples of finding coordinate frames, see the section “Finding the Three Points of the L in an Image Frame” in the post about the L Triangle technique.