Below is a quick explanation that glazes over many details, but gives a nice overview.
The eye tracker is made up of interconnected modules. Each does it's own processing on a set of inputs, and exports it's data as a set of outputs that may be used by later modules. Modules are chained into what we call a "pipeline".
Some examples of modules are things like:
- PupilRefiner: Improves the initial estimation of the pupil ellipse on the 2D image.
- GlassesFrameDetector: Detects portions of the eye image that contain a glasses frame.
- GlintBlobClassifier: Determines which LED is being reflected by a particular glint.
The outputs of these modules are passed to later modules that need to know the pupil location, whether some point is obscured by glasses, or the location of a specific glint. There are dozens of modules, so we wont go into all of them here.
However, in general, modules can be sorted into two categories: Image Processing and 3D Reconstruction.
The image processing portion attempts to localize and track features on the video (such as pupil, glints, iris, eyelid etc). This starts with a basic shape detector to identify features in the 2D image such as the pupil ellipse. There are then various refiners that improve the initial estimation afterwards. Temporal algorithms are used, whereby information about the current frame can gleaned via information from the last frame.
Based on the optical block geometry, camera parameters, and headset lens properties, all the image processing features are then used to reconstruct a virtual 3D model of the eye. Using the corneal reflections and pupil center features we can estimate cornea curvature radius and position, estimate position of the eye pivot point (eyeball position) and obtain the distance between pivot point and pupil (eyeball radius). In addition, all the detected features on the camera image can be projected to 3D space, where every feature such as the pupil and glints position have coordinates in 3D space. Thus every feature, such as pupil radius, can be estimated in physical units as opposed to pixels on the 2D image.
Once an accurate 3D model of the eye is constructed, we can estimate the gaze direction by directing a ray from the eyeball pivot through the pupil, and then applying the optical-to-visual-axis offset.
Of course, the eye tracker is under constant development, so details change from version to version, but the general approach is constant.
The torsion measurements, available in our Pro version, are a fully 2D-based measurement. Features of the Iris are located and tracked, then the amount that they've been rotated, with respect to the baseline at calibration, is computed.
When we make changes to the eye tracker, we always benchmark it to ensure that an improvement was made, or at least that there was no regression was made. Benchmarking itself is a deep and complicated topic. Our process and recommendations to the overall industry are described in this paper
(full text available if you click "PDF") that we published with the ACM at ETRA `2021