Deep Gerchberg Saxton

‘Deep Gerchberg-Saxton’: A Physics Informed Neural Network Architecture

A novel deep learning approach to Volumetric Phase Retrieval problems; applied to immersive displays

Summary

Less-Technical Overview: The aim of this project is to control special devices called phased arrays that can produce tactile sensations in mid air, simulating touching an object when there is, in reality, no object there. Methods at this have been created, but have been limited by accuracy in producing the desired sensations. There are also applications of the techniques described here to radio, satellites, visual displays, and sound.

Abstract: ‘Deep Gerchberg-Saxton; is a physics informed neural network (PINN) architecture that addresses an incomplete area in applied physics with huge structural implications to engineering: phase retrieval problems. Phase retrieval problems are concerned with finding the correct phase of an oscillation so as to produce a desired state after a certain distance or time of propagation. For any non-trivial phase retrieval problem, given a desired state there is no guarantee of an exact solution to produce it, nor is there a procedural method of getting the closest possible answer. Gerchberg-Saxton is the among the best techniques for approximating a phase retrieval solution, using a combination of Fourier and Inverse Fourier Transforms and propagating through the Angular Spectrum Method. However, it is not very accurate and there is no training, so inference is quite slow- around 200 iterations of Fourier Layers and gradient descent per planar slice per instant in time.

‘Deep Gerchberg-Saxton’ is a learned replacement for iterative numerical solvers that, reduces the inference complexity to around 3 Fourier layers and improves phase retrieval fidelity. The applications are diverse and the upshot is devices like phased arrays having continuous volumetric amplitude field production with low computational power.

Due to IP constraints, some of the implementation and details are redacted.

Key Contribution: Designed physics informed neural network (PINN) architecture with wave propagation operators embedded directly in the network that reduced inference from ~200 iterative Fourier propagation steps to ~3 learned Fourier layers while improving phase reconstruction fidelity, enabling improved phased array control in deployed devices.

Skills

ML Production Pipeline ▪︎ Physics Informed Neural Networks (PINNs) ▪︎ Custom Neural Network Architecture Design ▪︎ Loss Function Design for Physical Constraints ▪︎ Inference-Time Complexity Optimization ▪︎ Algorithmic Benchmarking vs Classical Methods ▪︎ ML Inference Optimization for Embedded / Low-Compute Systems ▪︎ Phase Retrieval Algorithms ▪︎ Volumetric Field Synthesis ▪︎ Wave Propagation & Angular Spectrum Method ▪︎ Computational Physics ▪︎ Fourier and Inverse Fourier Transforms ▪︎ Python ▪︎ NumPy ▪︎ Matplotlib ▪︎ Tensorflow ▪︎ PyTorch

Highlighted Role

Jupyter Notebook

Details:

Physics Background: Superposition to Angular Spectrum Method

Superposition is the property of waves, under the linear wave equation assumption, such that the amplitude at a given point is the sum of the amplitude produced by each distinct source wave at that point. You may think of how two speakers, each playing music are generally louder than a single speaker on its own. In essence, the amplitude (a linear version of volume or dB) from each speaker is added together.

The Angular Spectrum Method (ASM) becomes useful when there are many source signals- say over 150- and we are interested in the amplitude over an area rather than at a single point. We use ASM because of its reduced computational complexity: O(N*M) -> O(Mlog(M)), for N sources and M evaluation points.

(Linear Wave Equation Assumption)

Here we see the graphs of two sine waves with the same amplitude and different frequencies plotted on a 2D graph. On the bottom we see a simple sum of the two functions.

Fourier Transforms

Fourier transforms, in simple terms, convert a spatial or amplitude representation of a function into one of frequency. The graph here is of the frequency components of the superimposed function in the above graph. On the x-axis we see the frequency and on the right the relative magnitude. Notice that the prevalent frequencies are not simply the component frequencies but also the sum and difference frequencies.

Angular Spectrum Method

The Angular Spectrum Method follows these four steps for propagating a sampled planar acoustic field (in this application the phased array plane) to a desired plane (in this case the user’s hand):

Sample the complex points over emission plane
Apply 2D Fourier Transform
Apply propagation transfer function to each sampled point
Apply 2D Inverse Fourier Transform

Sources

https://www.umbjournal.org/article/0301-5629(94)90109-0/abstract

https://www.researchgate.net/publication/51802533_Extension_of_the_angular_spectrum_method_

to_calculate_pressure_from_a_spherically_curved_acoustic_source

https://pubs.aip.org/asa/jasa/article-abstract/85/5/2202/799423/Transducer-characterization-using-the-angular?redirectedFrom=fulltext

https://www.mdpi.com/1099-4300/22/12/1354

Physics Background: Gerchberg-Saxton Algorithm for Phase Retreival

The Gerchberg-Saxton Algorithm for phase retrieval is among the most ubiquitous in its use case. However, it is inefficient and lossy, leading to many hybrid and specialized techniques based on application. However, the basic building blocks- using ASM and making updates the the amplitude field on each forward and backward propagation lend itself extremely well to a deep learning application- in essence, the deep learning component that I propose answers the question of what updates should be made at each iteration of ASM to best converge to the optimal solution.

Source

Gerchberg-Saxton (GS)

Gerchberg-Saxton (GS) first takes an initial guess at phase in the emission plane (the phased array plane) and the known amplitude. Then, it uses ASM for forward propagation to the target plane (a user’s hand) and constrains the amplitude to the target image (the desired acoustic field to produce tactile sensations), keeping the phased components. Then, ASM for backward propagation brings the field back to the emission plane. The amplitude is constrained here to the known amplitude.

This process is repeated until the error criterion is met.

The limitations of Gerchberg-Saxton have been widely discussed. The most important is that there is no guarantee of convergence, often getting stuck in local minima. There are also the aforementioned inference complexity issues.

Limitations of Gerchberg-Saxton

Sources

https://www.umbjournal.org/article/0301-5629(94)90109-0/abstract

https://www.researchgate.net/publication/51802533_Extension_of_the_angular_spectrum_method_

to_calculate_pressure_from_a_spherically_curved_acoustic_source

https://pubs.aip.org/asa/jasa/article-abstract/85/5/2202/799423/Transducer-characterization-using-the-angular?redirectedFrom=fulltext

https://www.mdpi.com/1099-4300/22/12/1354

Deployed Solution

Implementation details have been redacted due to IP constraints.

The core of the prosed system is using the intuition of the iterative improvements typical of GS, while using deep learning to make those updates, and bring about convergence to an error criterion more often and with less computational complexity during inference than traditional GS.

Problem Formalization

Input: Discrete representation of target amplitude field, ŷ, at image plane.

Output: Discrete representation of amplitude restricted phase distribution, y, at emission plane.

Objective: Minimize reconstruction error between target, y, and actual, ŷ, after propagation of planar phase distribution and amplitude from emission plane to target plane, i.e. Minimize(Loss( ASM(y), ŷ )).

Data Synthesis

These are simple examples of the data synthesis technique used. Unlike some machine learning paradigms, PINNs often use synthetic data generated according to known physcial laws or desired physical states.

Methodology

I employed best practices in machine learning to prevent overfitting, data leakage, and accurately estimate performance with the correct metrics.

This includes a 70/15/15 train/test/validate split followed by partitioned normalization, a custom MSE based loss function, and hyperparameter tuning constrained to training folds.

Highlighted Roles & Projects

Wavefront Audio

R&D Movie Studio Manager

Ocean Site One

Reverse Projection and 7.1 Surround Sound Outdoor Movie

F.R.I.D.A. (Friendly Imperial Droid Assistant)

LLM Cursor: An On Cursor LLM Writing Tool

Deep Gerchberg Saxton

Fluid Simulation Engine for Immersive Environments

Other Roles & Projects

Ultrasound for Colony Collapse in Bee Hives

Motion Capture Tech for Percy Jackson Musical

Acoustophoresis (Acoustic Levitation)

Gesture-Based Computer Control

boX: Custom Deep Learning Server

FISA-B Mobile App Head Manager

Adolescent Expanding Knee Replacements

Micro-Financing Researcher: Bangalore, India

Art, Travel, & Adventure

Life and Culture through Surf Photography

‘Super Complex’ Origami

Ink Drawing

Bringing in Color: Abstract and Impressionistic Digital Art

Pastel Impressions

On Causal Force: A Modern Perspective on the Philosophy of Causation

Trekking in the Himalayas: SAR Pass

My Life in India

Eclectic Adventures

NATHAN GOLLAY