Face Recognition: From Scratch to Hatch презентация

Содержание

Слайд 2

Face Recognition in Cloud@Mail.ru

Users upload photos to Cloud
Backend identifies persons on

photos, tags and show clusters

Слайд 3

Social networks

Слайд 5

edges

object parts (combination of edges)

object models

Слайд 7

Face detection

Слайд 8

Auxiliary task: facial landmarks

Face alignment: rotation
Goal: make it easier for Face Recognition

Слайд 9

Train Datasets

Wider
32k images
494k faces

Celeba
200k images, 10k persons
Landmarks, 40 binary attributes

Слайд 10

Test Dataset: FDDB

Face Detection Data Set and Benchmark
2845 images
5171 faces

Слайд 11

Old school: Viola-Jones

Haar Feature-based Cascade Classifiers

Слайд 12

Viola-Jones algorithm: training

Face or Not

Слайд 13

Viola-Jones algorithm: inference

Stages

Face

Yes

Yes

Stage 1

Stage 2

Stage N

Optimization
Features are grouped into stages
If a patch fails

any stage => discard

Слайд 14

Viola-Jones results

OpenCV implementation
Fast: ~100ms on CPU

Not accurate

Слайд 15

Pre-trained network: extracting features

New school: Region-based Convolutional Networks
Faster RCNN, algorithm

Face ?

Region proposal

network

RoI-pooling: extract corresponding tensor

Classifier: classes and the bounding box

Слайд 16

Comparison: Viola-Jones vs R-FCN

Results
92% accuracy (R-FCN)

FDDB
results

40ms on GPU (slow)

Слайд 17

Face detection: how fast

We need faster solution at the same accuracy!
Target: <

10ms

Слайд 18

Alternative: MTCNN

Cascade of 3 CNN

Resize to different scales

Proposal -> candidates + b-boxes

Refine ->

calibration

Output -> b-boxes + landmarks

Слайд 19

Comparison: MTCNN vs R-FCN

MTCNN
+ Faster
+ Landmarks
- Less accurate
- No batch processing

Слайд 21

What is TensorRT

NVIDIA TensorRT is a high-performance deep learning inference optimizer
Features
Improves performance for

complex networks
FP16 & INT8 support
Effective at small batch-sizes

Слайд 22

TensorRT: layer optimizations

Horizontal fusion

Concat elision

Vertical layer fusion

Слайд 23

TensorRT: downsides

Caffe + TensorFlow supported
Fixed input/batch size
Basic layers support

Слайд 24

Batch processing

Problem
Image size is fixed, but
MTCNN works at different scales

Solution
Pyramid on a single

image

Слайд 25

Batch processing
Results
Single run
Enables batch processing

Слайд 26

TensorRT: layers
Problem
No PReLU layer => default pre-trained model can’t be used
Retrained with ReLU

from scratch

-20%

Слайд 27

Face detection: inference
Target: < 10 ms
Result: 8.8 ms

Ingredients
MTCNN
Batch processing
TensorRT

Слайд 29

Face recognition task

Goal – to compare faces

How? To learn metric

To enable Zero-shot learning

Слайд 30

Training set: MSCeleb

Top 100k celebrities
10 Million images, 100 per person
Noisy: constructed by leveraging

public search engines

Слайд 31

Small test dataset: LFW

Labeled Faces in the Wild Home
13k images from the web
1680

persons have >= 2 photos

Слайд 32

Large test dataset: Megaface

Identification under up to 1 million “distractors”
530 people to find

Слайд 33

Megaface leaderboard

~83%

~98%
cleaned

Слайд 34

Metric Learning

Слайд 35

Classification

Train CNN to predict classes

Pray for good latent space

Слайд 36

Softmax

Learned features only separable but not discriminative

The resulting features are not sufficiently effective

Слайд 37

We need metric learning

Tightness of the cluster

Discriminative features

Слайд 38

Triplet loss

Features
Identity -> single point
Enforces a margin between persons

positive + α < negative

Слайд 39

Choosing triplets

Crucial problem
How to choose triplets ? Useful triplets = hardest errors

Solution
Hard-mining within

a large mini-batch (>1000)

Слайд 40

Choosing triplets: trap

Слайд 41

Choosing triplets: trap

positive ~ negative

Слайд 42

Choosing triplets: trap

Instead

Слайд 43

Choosing triplets: trap

Selecting hardest negative may lead to the collapse early in training

Слайд 44

Choosing triplets: semi-hard

positive < negative < positive + α

Слайд 45

Triplet loss: summary

Overview
Requires large batches, margin tuning
Slow convergence
Opensource Code
Openface (Torch)
suboptimal implementation
Facenet, not original

(TensorFlow)

Слайд 46

Center loss

Idea: pull points to class centroids

Слайд 47

Center loss: structure

Without classification loss – collapses

Softmax
Loss

Center
Loss

Final loss = Softmax loss + λ

Center loss

Слайд 48

Center Loss: different lambdas

λ = 10-7

Слайд 49

Center Loss: different lambdas

λ = 10-6

Слайд 50

Center Loss: different lambdas

λ = 10-5

Слайд 51

Center loss: summary

Overview
Intra-class compactness and inter-class separability
Good performance at several other tasks
Opensource Code
Caffe

(original, Megaface - 65%)

Слайд 52

Tricks: augmentation

Test time augmentation
Flip image

Average embeddings

Compute 2 embeddings

Слайд 53

Tricks: alignment

Rotation
Kabsch algorithm - the optimal rotation matrix that minimizes the RMSD

Слайд 54

Angular Softmax

On sphere
Angle discriminates

Слайд 55

Angular Softmax

Слайд 56

Angular Softmax: different «m»

Слайд 57

Angular softmax: summary

Overview
Works only on small datasets

Slight modification of the loss yields 74.2%

Various

modification of the loss function

Слайд 58

Metric learning: summary

Softmax < Triplet < Center < A-Softmax
A-Softmax
With bells and whistles better

than center loss

Overall
Rule of thumb: use Center loss
Metric learning may improve classification performance

Слайд 59

Fighting errors

Слайд 60

Errors after MSCeleb: children
Problem
Children all look alike
Consequence
Average embedding ~ single point in the

space

Слайд 61

Errors after MSCeleb: asian
Problem
Face Recognition’s intolerant to Asians
Reason
Dataset doesn’t contain enough photos of

these categories

Слайд 62

How to fix these errors ?

It’s all about data, we need diverse dataset!
Natural

choice – avatars of social networks

Слайд 63

A way to construct dataset

Cleaning algorithm
Face detection

Face recognition -> embeddings

Hierarchical clustering algorithm

Pick the

largest cluster as a person

Iterate after each model improvement

Слайд 64

MSCeleb dataset’s errors

MSCeleb is constructed by leveraging search engines

Joe Eszterhas and Mel Gibson

public confrontation leads to the error

=

Слайд 65

MSCeleb dataset’s errors

Female
+
Male

Слайд 66

MSCeleb dataset’s errors

Asia
Mix

Слайд 67

MSCeleb dataset’s errors

Dataset has been shrinked from 100k to 46k celebrities

Random
search engine

Слайд 68

Results on new datasets

Datasets
Train:
MSCeleb (46k)
VK-train (200k)

Test
MegaVK
Sets for children and asians

Слайд 69

How to handle big dataset

It seems we can add more data infinitely, but

no.
Problems
Memory consumption (Softmax)
Computational costs
A lot of noise in gradients

Слайд 70

Softmax Approximation

Algorithm
Perform K-Means clustering using current FR model

Слайд 71

Softmax Approximation

Algorithm
Perform K-Means clustering using current FR model

Two Softmax heads:

Predicts cluster label

Class within

the true cluster

Слайд 72

Softmax Approximation

Pros

Prevents fusing of the clusters

Does hard-negative mining

Clusters can be specified
Children
Asian

Results
Doesn’t improve accuracy
Decreases

memory consumption (K times)

Слайд 73

Fighting errors on production

Слайд 74

Errors: blur
Problem
Detector yields blurry photos
Recognition forms «blurry clusters»
Solution
Laplacian – 2nd order derivative of

the image

Слайд 75

Laplacian in action

Low
variance

High
variance

Слайд 76

Errors: body parts

Detection
mistakes form
clusters

Слайд 77

Errors: diagrams & mushrooms

Слайд 78

Fixing trash clusters

There is similarity between “no faces”!

Слайд 79

Workaround

Algorithm

Construct «trash» dataset

Compute average embedding

Every point inside the sphere – trash

Results
ROC AUC

97%

Слайд 80

Spectacular results

Слайд 81

Fun: new governors

Recently appointed governors are almost twins, but FR distinguishes them

Слайд 82

Over years

Face recognition algorithm captures similarity across years
Although we didn’t focus on

the problem

Слайд 83

Over years

Слайд 84

Summary
Use TensorRT to speed up inference
Metric learning: use Center loss by default
Clean your

data thoroughly
Understanding CNN helps to fight errors

Слайд 86

Auxiliary

Слайд 87

Best avatar

Problem
How to pick an avatar for a person ?

Solution
Train model to predict

awesomeness of photo

Слайд 88

Predicting awesomeness: how to approach

Social networks – not only photos, but likes too

Слайд 89

Predicting awesomeness: dataset

Awesomeness (A) = likes/audience

A=18%

A=27%

A=75%

Слайд 90

Results
Mean Aveage Precision @5: 25%
Data and metric are noisy => human evaluation

Predicting awesomeness:

summary
Имя файла: Face-Recognition:-From-Scratch-to-Hatch.pptx
Количество просмотров: 64
Количество скачиваний: 0