GenAI Optimization Techniques

Hemant Rawat
5 min readAug 18, 2024

--

Gradient Descent, Scholastic Control Problem & Diffusion Models

GenAI refers to AI models that are able to learn structure from data and generate novel content which includes text, images, audio and video. These AI models are typically implemented as deep neural networks with billions of parameters.

Three aspects of Gen AI

Lets look at basics of optimization/training techniques.

Training machine learning models, particularly deep learning models, involves optimizing a set of parameters to minimize a loss function that quantifies the difference between the model’s predictions and the actual data. There are various optimization techniques, lets look at couple of them.

Gradient Descent is a widely used optimization algorithm that iteratively adjusts a model’s parameters in the direction that most effectively reduces the loss. This is achieved by calculating the gradient (partial derivatives) of the loss function with respect to the model’s parameters.

Stochastic Gradient Descent (SGD) is one variant of Gradient Descent where parameters are updated using a single data sample at each iteration.

What is Stochastic Optimal Control?

It involves finding the optimal control policy that minimizes (or maximizes) a cost function over time, where the system’s dynamics are governed by a stochastic differential equation (SDE). The optimization here is over a space of control strategies rather than static parameters as in case of Gradient descent.

What is Schrödinger Bridge?

The Schrödinger bridge problem is a special class of Stochastic Optimal Control problem that connects probability distributions over time, finding an optimal way to transition between two given probability distributions.

Probability initial and target distribution

Given an initial distribution and a final distribution (or marginal distributions at two times), the problem is to find the most likely stochastic process that transforms the initial distribution into the target distribution over time.

From an optimization perspective, the Schrödinger bridge can be seen as finding the optimal transition plan between two probability distributions under a stochastic process. This involves solving a variational problem where the objective is to minimize some cost (often related to entropy or divergence) subject to the dynamics of the system and the endpoint constraints.

Diffusion Models

Diffusion process is dynamic (it changes over time) and is Scholastic or random (to create variability). These models involve learning the reverse process of a diffusion that turns data into noise, a process that can be viewed through the lens of Schrödinger bridges as finding an optimal transition between noisy and non-noisy states.

Assume a particle is moving in 2D/3D space.

Position (t + dt) = Position (t) + velocity (t).dt + Randomness

(dt is time difference)

where,

Intensity of pixel and how it changes over time = GenAI

Source Based Generative Model (SGM)

Source-based generation models refer to generative models that produce outputs conditioned on or guided by specific input data or sources. These are models that generate content based on a given source or condition. For example, in text-to-image generation, the source would be a text prompt, and the model generates an image that corresponds to the prompt.

  1. Forward Process

Forward process (a diffusion process) takes every pixel of image and try to corrupt over time dynamically.

Forward Process

We do it for each pixel and calculate the intensity of that pixel for time t p(Xt, t)

Probability Marginal Density P(xt, t)

2. Backward process (generation)

Score function
Backward process

Diffusion models appear in score-based Generative AI — Forward & Backward diffusion. These forward and backward diffusion are equivalent and share the same marginal distribution. The drift term f(x) is typically linear to get analytic representation of score function. It need to run for sufficient long time so that end distribution is Gaussian. (Notoriously long time in generating data, compare to GANs, and Flow-based models.)

Summary

While gradient descent is focused on optimizing static parameters by following the gradient of a loss function, SOC involves optimizing dynamic control policies to minimize a cost functional over time. The connection is particularly evident in methods that apply gradient-based updates to control policies, as seen in reinforcement learning and certain SOC approaches. Both fields use gradient-based techniques to iteratively improve solutions, whether they be control strategies or model parameters.

--

--