Skip to content

Commit 66d934a

Browse files
committed
Added article on optical flow classical to dl methods
1 parent 39a9324 commit 66d934a

2 files changed

Lines changed: 276 additions & 0 deletions

File tree

_data/navigation.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,8 @@ wiki:
190190
url: /wiki/machine-learning/neural-network-optimization-using-model-pruning.md
191191
- title: Deep learning techniques for 3D datasets
192192
url: /wiki/machine-learning/deep-learning-techniques-for-3d-datasets.md
193+
- title: Optical Flow - Classical to Deep Learning Implementation
194+
url: /wiki/machine-learning/optical-flow-classical-to-deep-learning-implementation.md
193195
- title: State Estimation
194196
url: /wiki/state-estimation/
195197
children:
Lines changed: 274 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
# Optical Flow: Classical to Deep Learning Implementation
2+
3+
## Introduction
4+
5+
Optical flow represents one of the foundational challenges in computer vision: how do we track the motion of objects between frames? When you watch a video, your brain effortlessly tracks the movement of objects across frames. Implementing this computationally requires sophisticated algorithms that can detect and quantify motion at the pixel level.
6+
7+
## Classical Methods and Their Mathematics
8+
9+
### The Lucas-Kanade Method
10+
11+
The Lucas-Kanade algorithm approaches optical flow through a fundamental equation that relates pixel intensity changes to motion. The algorithm is built on two key assumptions:
12+
13+
1. **Brightness Constancy**: A pixel maintains its intensity as it moves
14+
2. **Spatial Coherence**: Nearby pixels move similarly
15+
16+
These assumptions lead to the optical flow equation:
17+
```
18+
Ix * u + Iy * v + It = 0
19+
```
20+
where (u,v) represents the flow vector we want to compute.
21+
22+
Here's the implementation with detailed breakdown:
23+
24+
```python
25+
def lucas_kanade_flow(I1, I2, window_size=15):
26+
# Compute spatial and temporal gradients
27+
Ix = cv2.Sobel(I1, cv2.CV_64F, 1, 0, ksize=3)
28+
Iy = cv2.Sobel(I1, cv2.CV_64F, 0, 1, ksize=3)
29+
It = I2.astype(np.float32) - I1.astype(np.float32)
30+
31+
# Solve for each pixel in window
32+
u = np.zeros_like(I1, dtype=np.float32)
33+
v = np.zeros_like(I1, dtype=np.float32)
34+
35+
for i in range(window_size//2, I1.shape[0]-window_size//2):
36+
for j in range(window_size//2, I1.shape[1]-window_size//2):
37+
# Extract window gradients
38+
ix = Ix[i-window_size//2:i+window_size//2+1,
39+
j-window_size//2:j+window_size//2+1].flatten()
40+
iy = Iy[i-window_size//2:i+window_size//2+1,
41+
j-window_size//2:j+window_size//2+1].flatten()
42+
it = It[i-window_size//2:i+window_size//2+1,
43+
j-window_size//2:j+window_size//2+1].flatten()
44+
45+
# Construct system of equations
46+
A = np.vstack([ix, iy]).T
47+
b = -it
48+
49+
# Solve least squares
50+
if np.min(np.linalg.eigvals(A.T @ A)) >= 1e-6:
51+
nu = np.linalg.solve(A.T @ A, A.T @ b)
52+
u[i,j], v[i,j] = nu
53+
54+
return u, v
55+
```
56+
57+
This implementation:
58+
1. Computes image gradients using Sobel operators (Ix, Iy) and frame difference (It)
59+
2. For each pixel, considers a window of surrounding pixels
60+
3. Solves a least squares problem to find the motion vector
61+
4. Checks eigenvalues to ensure the solution is well-conditioned
62+
63+
### The Farnebäck Method
64+
65+
Farnebäck's algorithm represents a more sophisticated classical approach that can handle larger motions by using polynomial expansion to approximate pixel neighborhoods:
66+
67+
```python
68+
def farneback_flow(prev, curr):
69+
flow = cv2.calcOpticalFlowFarneback(
70+
prev, curr,
71+
None,
72+
pyr_scale=0.5, # Pyramid scale
73+
levels=3, # Pyramid levels
74+
winsize=15, # Window size
75+
iterations=3, # Iterations per level
76+
poly_n=5, # Polynomial expansion neighborhood
77+
poly_sigma=1.2, # Gaussian sigma
78+
flags=0
79+
)
80+
return flow
81+
```
82+
83+
The key parameters control:
84+
85+
1. **Multi-scale Analysis**:
86+
- `pyr_scale`: Controls pyramid scale reduction (0.5 means each level is half the size)
87+
- `levels`: Number of pyramid levels (more levels handle larger motions)
88+
89+
2. **Local Approximation**:
90+
- `winsize`: Size of neighborhood for polynomial expansion
91+
- `poly_n`: Size of neighborhood used for polynomial approximation
92+
- `poly_sigma`: Gaussian smoothing for polynomial coefficients
93+
94+
3. **Refinement**:
95+
- `iterations`: Number of iterations at each pyramid level
96+
97+
## Deep Learning Approaches
98+
99+
### FlowNet: End-to-End Flow Estimation
100+
101+
FlowNet revolutionized optical flow by showing that deep networks could learn to estimate flow directly from data. The architecture processes concatenated frames through an encoder-decoder structure:
102+
103+
```python
104+
class FlowNetS(nn.Module):
105+
def __init__(self, batchNorm=True):
106+
super(FlowNetS, self).__init__()
107+
108+
# Encoder
109+
self.conv1 = conv(batchNorm, 6, 64, kernel_size=7, stride=2)
110+
self.conv2 = conv(batchNorm, 64, 128, kernel_size=5, stride=2)
111+
self.conv3 = conv(batchNorm, 128, 256, kernel_size=5, stride=2)
112+
113+
# Decoder with skip connections
114+
self.deconv5 = deconv(1024, 512)
115+
self.deconv4 = deconv(1026, 256)
116+
self.deconv3 = deconv(770, 128)
117+
118+
# Flow prediction
119+
self.predict_flow6 = predict_flow(1024)
120+
self.predict_flow5 = predict_flow(1026)
121+
self.predict_flow4 = predict_flow(770)
122+
```
123+
124+
The architecture consists of:
125+
126+
1. **Encoder Path**:
127+
- Takes 6-channel input (concatenated RGB frames)
128+
- Progressive downsampling with increasing feature channels
129+
- Large initial kernels capture substantial motions
130+
- Batch normalization stabilizes training
131+
132+
2. **Decoder Path**:
133+
- Upsampling through deconvolution layers
134+
- Skip connections preserve fine details
135+
- Channel counts include flow predictions (e.g., 1026 = 1024 + 2)
136+
137+
3. **Multi-scale Prediction**:
138+
- Flow predicted at multiple resolutions
139+
- Coarse predictions handle large motions
140+
- Fine predictions refine details
141+
- Loss computed at all scales
142+
143+
### RAFT Architecture
144+
145+
RAFT (Recurrent All-Pairs Field Transforms) represents the current state-of-the-art through iterative refinement:
146+
147+
```python
148+
class RAFTFeatureExtractor(nn.Module):
149+
def __init__(self):
150+
super().__init__()
151+
self.backbone = ResNet18()
152+
self.conv1 = nn.Conv2d(256, 128, 1)
153+
self.conv2 = nn.Conv2d(256, 256, 1)
154+
155+
def forward(self, x):
156+
# Extract features at 1/8 resolution
157+
x = self.backbone(x)
158+
# Split into feature and context networks
159+
feat = self.conv1(x)
160+
ctx = self.conv2(x)
161+
return feat, ctx
162+
```
163+
164+
RAFT innovates through:
165+
166+
1. **Feature Extraction**:
167+
- Shared backbone network (ResNet18) processes both frames
168+
- Separate feature and context pathways
169+
- Features optimized for correlation computation
170+
- Context provides additional motion information
171+
172+
2. **All-Pairs Correlation**:
173+
```python
174+
def compute_correlation_volume(feat1, feat2, num_levels=4):
175+
"""Compute 4D correlation volume"""
176+
b, c, h, w = feat1.shape
177+
feat2 = feat2.view(b, c, h*w)
178+
179+
# Compute correlation for all pairs
180+
corr = torch.matmul(feat1.view(b, c, h*w).transpose(1, 2), feat2)
181+
corr = corr.view(b, h, w, h, w)
182+
183+
# Create correlation pyramid
184+
corr_pyramid = []
185+
for i in range(num_levels):
186+
corr_pyramid.append(F.avg_pool2d(
187+
corr.view(b*h*w, 1, h, w),
188+
2**i+1,
189+
stride=1,
190+
padding=2**i//2
191+
))
192+
193+
return corr_pyramid
194+
```
195+
196+
This creates a 4D correlation volume that:
197+
- Captures all possible matches between frames
198+
- Enables large displacement handling
199+
- Provides multi-scale correlation information
200+
201+
3. **Iterative Updates**:
202+
```python
203+
class RAFTUpdater(nn.Module):
204+
def __init__(self):
205+
super().__init__()
206+
self.gru = ConvGRU(hidden_dim=128)
207+
self.flow_head = FlowHead(hidden_dim=128)
208+
209+
def forward(self, net, inp, corr, flow):
210+
# Update hidden state using correlation and context
211+
net = self.gru(net, inp, corr)
212+
# Predict flow update
213+
delta_flow = self.flow_head(net)
214+
return net, flow + delta_flow
215+
```
216+
217+
The updater:
218+
- Maintains flow estimate in hidden state
219+
- Refines estimate through multiple iterations
220+
- Uses GRU for temporal coherence
221+
- Predicts incremental updates
222+
223+
## Training and Evaluation
224+
225+
### Loss Functions
226+
227+
The standard metric for optical flow is the EndPoint Error (EPE):
228+
229+
```python
230+
def endpoint_error(pred_flow, gt_flow):
231+
"""
232+
Calculate average end-point error
233+
pred_flow, gt_flow: Bx2xHxW tensors
234+
"""
235+
# Compute per-pixel euclidean distance
236+
epe = torch.norm(pred_flow - gt_flow, p=2, dim=1)
237+
# Return mean error
238+
return epe.mean()
239+
```
240+
241+
For multi-scale training, we use a weighted combination:
242+
243+
```python
244+
def multiscale_loss(flow_preds, flow_gt, weights):
245+
"""
246+
Compute weighted loss across multiple scales
247+
"""
248+
loss = 0
249+
for flow, weight in zip(flow_preds, weights):
250+
# Downsample ground truth to match prediction
251+
scaled_gt = F.interpolate(
252+
flow_gt,
253+
size=flow.shape[-2:],
254+
mode='bilinear'
255+
)
256+
# Compute EPE at this scale
257+
loss += weight * endpoint_error(flow, scaled_gt)
258+
return loss
259+
```
260+
261+
## Conclusion
262+
263+
The evolution of optical flow algorithms shows a clear progression:
264+
1. Classical methods built on mathematical principles and assumptions
265+
2. Early deep learning replaced hand-crafted features with learned ones
266+
3. Modern architectures like RAFT combine learning with sophisticated architectural designs
267+
268+
Each approach offers different trade-offs between:
269+
- Accuracy vs. computational cost
270+
- Large vs. small motion handling
271+
- Training data requirements
272+
- Real-time performance capabilities
273+
274+
Choose your method based on your specific requirements for these factors.

0 commit comments

Comments
 (0)