FaceDetecting/FaceDetectionModelNotesAndTimeline.txt at main · RJTHEGOAT13/FaceDetecting · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
Face Detection Model Tutorial Outline
Goal:
Build a face detection model from scratch using Python and TensorFlow, based on a deep object detection architecture.

Model can detect various object types, not just faces

Includes collecting images, annotating them, and building a deep learning model

Part 1 – Collect Images & Annotate
Collect data and label it for face detection using bounding boxes

Use cases: facial sentiment analysis, facial verification

Breakdown Board
Train on images with annotated bounding boxes

Use data augmentation (albumentations) to expand the dataset

Model consists of a classification head and a regression head

Losses: binary cross-entropy for classification; localization loss for regression

Setting Up & Getting Data
Install dependencies: labelme, tensorflow, opencv, matplotlib, albumentations

Use os, time, uuid, and opencv to organize files, handle timing, create unique IDs, and capture images

Folder structure:

data/images/ for raw images

data/labels/ for JSON annotations

Collecting and Labeling Images
Capture images in batches (e.g., 30 at a time) with delays to add variation

Label with labelme:

Draw bounding boxes around faces

Assign the label “face”

Save annotations as JSON files

Review and edit as needed

Part 2 – Partition & Augment Data
Split data into training, testing, and validation sets

Apply augmentation (random crop, flips, brightness/contrast/gamma adjustments) using albumentations

Review Dataset & Build Loading Function
Import: tensorflow, json, numpy, matplotlib

Limit GPU memory growth; check GPU availability

Use tf.data.Dataset.list_files and map to load images and labels

Visualize samples with matplotlib

Partition Unaugmented Data
Manually move images/labels into train/, test/, and val/ folders

Typical split: 63 images train, 14 test, 13 validation

Apply Image Augmentation
Define augmentation pipeline with six transformations

Specify bounding-box format (Pascal VOC, COCO, or YOLO)

Generate ~60 augmented images per original image

Save augmented data into an org_data/ directory structure

Load Augmented Data into TensorFlow
Create tf.data.Dataset pipelines for train, test, and validation

Resize images to 120×120 and scale pixel values to [0,1]

Shuffle, batch, and prefetch for performance

Prepare & Load Labels
Write a load_labels function to parse JSON files into class and coordinate arrays

Map label-loading function over label files in the dataset

Combine Images & Labels
Zip image and label datasets to form complete samples

Confirm dataset sizes: e.g., 3720 training, 840 testing, 720 validation samples

Part 3 – Build & Train the Model
Model Architecture
Base: VGG16 (pretrained, trimmed of its top layers)

Add two heads:

Classification output (sigmoid activation)

Regression output for bounding boxes (sigmoid activation)

Total parameters: ~16.8 million

Custom Losses & Optimizer
Classification loss: binary cross-entropy

Localization loss: sum of squared differences for coordinates and sizes

Test loss functions on sample data to verify correctness

Optimizer: Adam with learning-rate decay based on number of batches

Custom Training Loop
Create a FaceTracker Keras model subclass with:

__init__, compile, train_step, test_step, and call methods

In train_step, compute both losses and apply gradients

Compile & Fit
Compile model with optimizer and both losses

Train with model.fit, supplying training and validation datasets

Monitor training via TensorBoard

Part 4 – Test & Real‑Time Detection
Prepare a test data iterator

Run predictions and visualize results; debug any issues with architecture or data

Retrain with adjusted parameters if needed

Save the final model

Real‑Time Face Detection
Capture video frames, resize and preprocess for the model

Make predictions, draw bounding boxes, and render on the frame

Adjust rectangle size and styling as needed

Final Results
Successfully built a custom TensorFlow object detector for faces

Works under varied lighting and backgrounds (e.g., green screen)

Performance can improve further with more data

Code & Resources:
Final project code and details are available on GitHub.
Final validation regression loss: 0.025; total loss: 0.065.