Understanding computer and machine vision
in deep learning for vision applications. This breakthrough accelerated the development of more sophisticated AI-based vision systems, which soon found applications in autonomous vehicles, medical diagnostics, robotics, and facial recognition. Deep learning models such as ResNet, YOLO, and Vision Transformers further pushed the boundaries of what was possible in computer vision, enabling real-time object detection and recognition with unprecedented accuracy. As vision systems have become increasingly reliant on AI the focus has shifted towards real-time processing and Edge computing. Edge AI has allowed machine vision applications to function in environments where latency was critical, such as robotics, industrial automation, and security systems. Today, machines and computers are embedded across a wide range of industries, from smart factories
to AI-powered medical imaging. Advances in sensor technology, AI-driven models, and high-speed processors continue to refine and expand the capabilities of modern vision systems. Topologies, types, and technologies in computer and machine vision Computer and machine vision systems are built on various topologies, sensor types, and processing technologies, each suited for different applications. Depth perception technologies, including structured light, stereo
■ Global methods, such as SGM, optimize for disparity consistency across images ■ Deep learning methods, like CNNs and transformer-based models, improve robustness in low-texture or occluded regions The resulting disparity map then provides relative depth estimation where the baseline (distance between cameras) and focal length determine the depth accuracy. A larger baseline improves depth precision but increases occlusion. Technical advantages: ■ Passive system that does not require active light projection, enabling outdoor operation ■ Well-suited for robotic navigation, SLAM (Simultaneous Localization and Mapping), and industrial automation ■ Scalable and cost-effective when using high-resolution CMOS or CCD sensors Challenges: ■ Performance is highly dependent on image texture; uniform or repetitive surfaces lead to poor disparity matching ■ Requires significant computational resources for
Structured light Structured light depth sensing operates by projecting a predefined pattern – often a grid, stripe, or dot matrix – onto a scene and capturing its deformation with a camera. By analyzing how the projected pattern distorts upon contact with surfaces, the system reconstructs depth information with high precision. This technique is commonly implemented using infrared (IR) sources to avoid visible light interference. As demonstrated in Figure 3, the depth is computed using triangulation principles. The projector and camera form a stereo vision system, where the displacement of structured patterns provides disparity information that translates into depth. Depth, Z, is determined using the formula: Here, B is the baseline distance between the projector and camera, f is the focal length, and d is the disparity. Different projection encoding methods can be used to optimize accuracy, speed, or robustness. Phase-shift encoding uses a sinusoidal (wave-like) pattern projected with multiple phase shifts to calculate phase differences at subpixel precision. Gray code/ binary-coded patterns make use of sequential binary patterns uniquely
(x,y,z)
(x R ,y R )
C
Figure 4: A standard stereo vision system example.
β
ß
a
(x L ,y L )
Right Camera
Left Camera
encoded onto each pixel, making it far more robust against occlusions and object discontinuities. Speckle projection is a pseudo-random dot matrix which enables dense depth estimation with a single frame, for example with Apple’s Face ID. Technical advantages: ■ High spatial resolution and depth accuracy at short ranges (typically under 5m) ■ Effective for applications requiring fine-grained depth maps, such as 3D scanning, biometric authentication (e.g., Face ID), and industrial metrology ■ Real-time operation with structured illumination strategies like phase-shifting or binary- coded patterns Challenges: ■ Susceptible to ambient light variations, making outdoor performance unreliable ■ Requires a high-framerate camera and precise calibration for accurate reconstruction ■ Performance degrades with
highly reflective or transparent surfaces Stereo vision Stereo vision replicates human binocular perception by capturing two images from slightly offset cameras and computing depth through disparity analysis. The correspondence problem – matching pixels between left and right images – is addressed through algorithms such as block matching, semi-global matching (SGM), or deep learning-based methods. It is important to consider with disparity map computation that: ■ Local methods, such as block matching, are faster but more sensitive to noise and texture
vision, time-of-flight (ToF), and LiDAR, provide different
approaches to 3D vision, enabling machines to understand spatial relationship and movement with greater accuracy.
The topologies for depth sensing:
Figure 3: A regular striped pattern is projected onto the ball. The rounded surface of the ball distorts the stripes, and the distorted image is captured by a camera for analysis and object reconstruction.
Figure 5: An illustration of dToF
Emitted light (pulse signal)
Light-emitting element
Target object
Time difference
Light-receiving element
we get technical
42
43
Powered by FlippingBook