Face Recognition: From Scratch to Hatch презентация

Октябрь 2, 2021

Главная
Информатика
Face Recognition: From Scratch to Hatch

Содержание

2. Face Recognition in Cloud@Mail.ru Users upload photos to Cloud Backend identifies persons on photos, tags and
3. Social networks
5. edges object parts (combination of edges) object models
7. Face detection
8. Auxiliary task: facial landmarks Face alignment: rotation Goal: make it easier for Face Recognition
9. Train Datasets Wider 32k images 494k faces Celeba 200k images, 10k persons Landmarks, 40 binary attributes
10. Test Dataset: FDDB Face Detection Data Set and Benchmark 2845 images 5171 faces
11. Old school: Viola-Jones Haar Feature-based Cascade Classifiers
12. Viola-Jones algorithm: training Face or Not
13. Viola-Jones algorithm: inference Stages Face Yes Yes Stage 1 Stage 2 Stage N Optimization Features are
14. Viola-Jones results OpenCV implementation Fast: ~100ms on CPU Not accurate
15. Pre-trained network: extracting features New school: Region-based Convolutional Networks Faster RCNN, algorithm Face ? Region proposal
16. Comparison: Viola-Jones vs R-FCN Results 92% accuracy (R-FCN) FDDB results 40ms on GPU (slow)
17. Face detection: how fast We need faster solution at the same accuracy! Target:
18. Alternative: MTCNN Cascade of 3 CNN Resize to different scales Proposal -> candidates + b-boxes Refine
19. Comparison: MTCNN vs R-FCN MTCNN + Faster + Landmarks - Less accurate - No batch processing
21. What is TensorRT NVIDIA TensorRT is a high-performance deep learning inference optimizer Features Improves performance for
22. TensorRT: layer optimizations Horizontal fusion Concat elision Vertical layer fusion
23. TensorRT: downsides Caffe + TensorFlow supported Fixed input/batch size Basic layers support
24. Batch processing Problem Image size is fixed, but MTCNN works at different scales Solution Pyramid on
25. Batch processing Results Single run Enables batch processing
26. TensorRT: layers Problem No PReLU layer => default pre-trained model can’t be used Retrained with ReLU
27. Face detection: inference Target: Result: 8.8 ms Ingredients MTCNN Batch processing TensorRT
29. Face recognition task Goal – to compare faces How? To learn metric To enable Zero-shot learning
30. Training set: MSCeleb Top 100k celebrities 10 Million images, 100 per person Noisy: constructed by leveraging
31. Small test dataset: LFW Labeled Faces in the Wild Home 13k images from the web 1680
32. Large test dataset: Megaface Identification under up to 1 million “distractors” 530 people to find
33. Megaface leaderboard ~83% ~98% cleaned
34. Metric Learning
35. Classification Train CNN to predict classes Pray for good latent space
36. Softmax Learned features only separable but not discriminative The resulting features are not sufficiently effective
37. We need metric learning Tightness of the cluster Discriminative features
38. Triplet loss Features Identity -> single point Enforces a margin between persons positive + α
39. Choosing triplets Crucial problem How to choose triplets ? Useful triplets = hardest errors Solution Hard-mining
40. Choosing triplets: trap
41. Choosing triplets: trap positive ~ negative
42. Choosing triplets: trap Instead
43. Choosing triplets: trap Selecting hardest negative may lead to the collapse early in training
44. Choosing triplets: semi-hard positive
45. Triplet loss: summary Overview Requires large batches, margin tuning Slow convergence Opensource Code Openface (Torch) suboptimal
46. Center loss Idea: pull points to class centroids
47. Center loss: structure Without classification loss – collapses Softmax Loss Center Loss Final loss = Softmax
48. Center Loss: different lambdas λ = 10-7
49. Center Loss: different lambdas λ = 10-6
50. Center Loss: different lambdas λ = 10-5
51. Center loss: summary Overview Intra-class compactness and inter-class separability Good performance at several other tasks Opensource
52. Tricks: augmentation Test time augmentation Flip image Average embeddings Compute 2 embeddings
53. Tricks: alignment Rotation Kabsch algorithm - the optimal rotation matrix that minimizes the RMSD
54. Angular Softmax On sphere Angle discriminates
55. Angular Softmax
56. Angular Softmax: different «m»
57. Angular softmax: summary Overview Works only on small datasets Slight modification of the loss yields 74.2%
58. Metric learning: summary Softmax A-Softmax With bells and whistles better than center loss Overall Rule of
59. Fighting errors
60. Errors after MSCeleb: children Problem Children all look alike Consequence Average embedding ~ single point in
61. Errors after MSCeleb: asian Problem Face Recognition’s intolerant to Asians Reason Dataset doesn’t contain enough photos
62. How to fix these errors ? It’s all about data, we need diverse dataset! Natural choice
63. A way to construct dataset Cleaning algorithm Face detection Face recognition -> embeddings Hierarchical clustering algorithm
64. MSCeleb dataset’s errors MSCeleb is constructed by leveraging search engines Joe Eszterhas and Mel Gibson public
65. MSCeleb dataset’s errors Female + Male
66. MSCeleb dataset’s errors Asia Mix
67. MSCeleb dataset’s errors Dataset has been shrinked from 100k to 46k celebrities Random search engine
68. Results on new datasets Datasets Train: MSCeleb (46k) VK-train (200k) Test MegaVK Sets for children and
69. How to handle big dataset It seems we can add more data infinitely, but no. Problems
70. Softmax Approximation Algorithm Perform K-Means clustering using current FR model
71. Softmax Approximation Algorithm Perform K-Means clustering using current FR model Two Softmax heads: Predicts cluster label
72. Softmax Approximation Pros Prevents fusing of the clusters Does hard-negative mining Clusters can be specified Children
73. Fighting errors on production
74. Errors: blur Problem Detector yields blurry photos Recognition forms «blurry clusters» Solution Laplacian – 2nd order
75. Laplacian in action Low variance High variance
76. Errors: body parts Detection mistakes form clusters
77. Errors: diagrams & mushrooms
78. Fixing trash clusters There is similarity between “no faces”!
79. Workaround Algorithm Construct «trash» dataset Compute average embedding Every point inside the sphere – trash Results
80. Spectacular results
81. Fun: new governors Recently appointed governors are almost twins, but FR distinguishes them
82. Over years Face recognition algorithm captures similarity across years Although we didn’t focus on the problem
83. Over years
84. Summary Use TensorRT to speed up inference Metric learning: use Center loss by default Clean your
86. Auxiliary
87. Best avatar Problem How to pick an avatar for a person ? Solution Train model to
88. Predicting awesomeness: how to approach Social networks – not only photos, but likes too
89. Predicting awesomeness: dataset Awesomeness (A) = likes/audience A=18% A=27% A=75%
90. Results Mean Aveage Precision @5: 25% Data and metric are noisy => human evaluation Predicting awesomeness:
92. Скачать презентацию

Слайд 2

Face Recognition in Cloud@Mail.ru
Users upload photos to Cloud
Backend identifies persons on

photos, tags and show clusters

Слайд 3

Social networks

Слайд 4

Слайд 5

edges
object parts (combination of edges)
object models

Слайд 6

Слайд 7

Face detection

Слайд 8

Auxiliary task: facial landmarks
Face alignment: rotation
Goal: make it easier for Face Recognition

Слайд 9

Train Datasets
Wider
32k images
494k faces
Celeba
200k images, 10k persons
Landmarks, 40 binary attributes

Слайд 10

Test Dataset: FDDB
Face Detection Data Set and Benchmark
2845 images
5171 faces

Слайд 11

Old school: Viola-Jones
Haar Feature-based Cascade Classifiers

Слайд 12

Viola-Jones algorithm: training
Face or Not

Слайд 13

Viola-Jones algorithm: inference
Stages
Face
Yes
Yes
Stage 1
Stage 2
Stage N
Optimization
Features are grouped into stages
If a patch fails

any stage => discard

Слайд 14

Viola-Jones results
OpenCV implementation
Fast: ~100ms on CPU
Not accurate

Слайд 15

Pre-trained network: extracting features
New school: Region-based Convolutional Networks
Faster RCNN, algorithm
Face ?
Region proposal

network

RoI-pooling: extract corresponding tensor

Classifier: classes and the bounding box

Слайд 16

Comparison: Viola-Jones vs R-FCN
Results
92% accuracy (R-FCN)
FDDB
results
40ms on GPU (slow)

Слайд 17

Face detection: how fast
We need faster solution at the same accuracy!
Target: <

10ms

Слайд 18

Alternative: MTCNN
Cascade of 3 CNN
Resize to different scales
Proposal -> candidates + b-boxes
Refine ->

calibration

Output -> b-boxes + landmarks

Слайд 19

Comparison: MTCNN vs R-FCN
MTCNN
+ Faster
+ Landmarks
- Less accurate
- No batch processing

Слайд 20

Слайд 21

What is TensorRT
NVIDIA TensorRT is a high-performance deep learning inference optimizer
Features
Improves performance for

complex networks
FP16 & INT8 support
Effective at small batch-sizes

Слайд 22

TensorRT: layer optimizations
Horizontal fusion
Concat elision
Vertical layer fusion

Слайд 23

TensorRT: downsides
Caffe + TensorFlow supported
Fixed input/batch size
Basic layers support

Слайд 24

Batch processing
Problem
Image size is fixed, but
MTCNN works at different scales
Solution
Pyramid on a single

image

Слайд 25

Batch processing
Results
Single run
Enables batch processing

Слайд 26

TensorRT: layers
Problem
No PReLU layer => default pre-trained model can’t be used
Retrained with ReLU

from scratch

-20%

Слайд 27

Face detection: inference
Target: < 10 ms
Result: 8.8 ms
Ingredients
MTCNN
Batch processing
TensorRT

Слайд 28

Слайд 29

Face recognition task
Goal – to compare faces
How? To learn metric
To enable Zero-shot learning

Слайд 30

Training set: MSCeleb
Top 100k celebrities
10 Million images, 100 per person
Noisy: constructed by leveraging

public search engines

Слайд 31

Small test dataset: LFW
Labeled Faces in the Wild Home
13k images from the web
1680

persons have >= 2 photos

Слайд 32

Large test dataset: Megaface
Identification under up to 1 million “distractors”
530 people to find

Слайд 33

Megaface leaderboard
~83%
~98%
cleaned

Слайд 34

Metric Learning

Слайд 35

Classification
Train CNN to predict classes
Pray for good latent space

Слайд 36

Softmax
Learned features only separable but not discriminative
The resulting features are not sufficiently effective

Слайд 37

We need metric learning
Tightness of the cluster
Discriminative features

Слайд 38

Triplet loss
Features
Identity -> single point
Enforces a margin between persons
positive + α < negative

Слайд 39

Choosing triplets
Crucial problem
How to choose triplets ? Useful triplets = hardest errors
Solution
Hard-mining within

a large mini-batch (>1000)

Слайд 40

Choosing triplets: trap

Слайд 41

Choosing triplets: trap
positive ~ negative

Слайд 42

Choosing triplets: trap
Instead

Слайд 43

Choosing triplets: trap
Selecting hardest negative may lead to the collapse early in training

Слайд 44

Choosing triplets: semi-hard
positive < negative < positive + α

Слайд 45

Triplet loss: summary
Overview
Requires large batches, margin tuning
Slow convergence
Opensource Code
Openface (Torch)
suboptimal implementation
Facenet, not original

(TensorFlow)

Слайд 46

Center loss
Idea: pull points to class centroids

Слайд 47

Center loss: structure
Without classification loss – collapses
Softmax
Loss
Center
Loss
Final loss = Softmax loss + λ

Center loss

Слайд 48

Center Loss: different lambdas
λ = 10-7

Слайд 49

Center Loss: different lambdas
λ = 10-6

Слайд 50

Center Loss: different lambdas
λ = 10-5

Слайд 51

Center loss: summary
Overview
Intra-class compactness and inter-class separability
Good performance at several other tasks
Opensource Code
Caffe

(original, Megaface - 65%)

Слайд 52

Tricks: augmentation
Test time augmentation
Flip image
Average embeddings
Compute 2 embeddings

Слайд 53

Tricks: alignment
Rotation
Kabsch algorithm - the optimal rotation matrix that minimizes the RMSD

Слайд 54

Angular Softmax
On sphere
Angle discriminates

Слайд 55

Angular Softmax

Слайд 56

Angular Softmax: different «m»

Слайд 57

Angular softmax: summary
Overview
Works only on small datasets
Slight modification of the loss yields 74.2%
Various

modification of the loss function

Слайд 58

Metric learning: summary
Softmax < Triplet < Center < A-Softmax
A-Softmax
With bells and whistles better

than center loss

Overall
Rule of thumb: use Center loss
Metric learning may improve classification performance

Слайд 59

Fighting errors

Слайд 60

Errors after MSCeleb: children
Problem
Children all look alike
Consequence
Average embedding ~ single point in the

space

Слайд 61

Errors after MSCeleb: asian
Problem
Face Recognition’s intolerant to Asians
Reason
Dataset doesn’t contain enough photos of

these categories

Слайд 62

How to fix these errors ?
It’s all about data, we need diverse dataset!
Natural

choice – avatars of social networks

Слайд 63

A way to construct dataset
Cleaning algorithm
Face detection
Face recognition -> embeddings
Hierarchical clustering algorithm
Pick the

largest cluster as a person

Iterate after each model improvement

Слайд 64

MSCeleb dataset’s errors
MSCeleb is constructed by leveraging search engines
Joe Eszterhas and Mel Gibson

public confrontation leads to the error

Слайд 65

MSCeleb dataset’s errors
Female
+
Male

Слайд 66

MSCeleb dataset’s errors
Asia
Mix

Слайд 67

MSCeleb dataset’s errors
Dataset has been shrinked from 100k to 46k celebrities
Random
search engine

Слайд 68

Results on new datasets
Datasets
Train:
MSCeleb (46k)
VK-train (200k)
Test
MegaVK
Sets for children and asians

Слайд 69

How to handle big dataset
It seems we can add more data infinitely, but

no.
Problems
Memory consumption (Softmax)
Computational costs
A lot of noise in gradients

Слайд 70

Softmax Approximation
Algorithm
Perform K-Means clustering using current FR model

Слайд 71

Softmax Approximation
Algorithm
Perform K-Means clustering using current FR model
Two Softmax heads:
Predicts cluster label
Class within

the true cluster

Слайд 72

Softmax Approximation
Pros
Prevents fusing of the clusters
Does hard-negative mining
Clusters can be specified
Children
Asian
Results
Doesn’t improve accuracy
Decreases

memory consumption (K times)

Слайд 73

Fighting errors on production

Слайд 74

Errors: blur
Problem
Detector yields blurry photos
Recognition forms «blurry clusters»
Solution
Laplacian – 2nd order derivative of

the image

Слайд 75

Laplacian in action
Low
variance
High
variance

Слайд 76

Errors: body parts
Detection
mistakes form
clusters

Слайд 77

Errors: diagrams & mushrooms

Слайд 78

Fixing trash clusters
There is similarity between “no faces”!

Слайд 79

Workaround
Algorithm
Construct «trash» dataset
Compute average embedding
Every point inside the sphere – trash
Results
ROC AUC

97%

Слайд 80

Spectacular results

Слайд 81

Fun: new governors
Recently appointed governors are almost twins, but FR distinguishes them

Слайд 82

Over years
Face recognition algorithm captures similarity across years
Although we didn’t focus on

the problem

Слайд 83

Over years

Слайд 84

Summary
Use TensorRT to speed up inference
Metric learning: use Center loss by default
Clean your

data thoroughly
Understanding CNN helps to fight errors

Слайд 85

Слайд 86

Auxiliary

Слайд 87

Best avatar
Problem
How to pick an avatar for a person ?
Solution
Train model to predict

awesomeness of photo

Слайд 88

Predicting awesomeness: how to approach
Social networks – not only photos, but likes too

Слайд 89

Predicting awesomeness: dataset
Awesomeness (A) = likes/audience
A=18%
A=27%
A=75%

Слайд 90

Results
Mean Aveage Precision @5: 25%
Data and metric are noisy => human evaluation
Predicting awesomeness:

summary

Face Recognition: From Scratch to Hatch презентация

Содержание

Face Recognition in Cloud@Mail.ru Users upload photos to Cloud Backend identifies persons on

Social networks

edgesobject parts (combination of edges)object models

Face detection

Auxiliary task: facial landmarksFace alignment: rotationGoal: make it easier for Face Recognition

Train DatasetsWider32k images494k facesCeleba200k images, 10k personsLandmarks, 40 binary attributes

Test Dataset: FDDBFace Detection Data Set and Benchmark2845 images5171 faces

Old school: Viola-JonesHaar Feature-based Cascade Classifiers

Viola-Jones algorithm: trainingFace or Not

Viola-Jones algorithm: inferenceStagesFaceYesYesStage 1Stage 2Stage NOptimizationFeatures are grouped into stagesIf a patch fails

Viola-Jones resultsOpenCV implementationFast: ~100ms on CPUNot accurate

Pre-trained network: extracting featuresNew school: Region-based Convolutional NetworksFaster RCNN, algorithmFace ? Region proposal

Comparison: Viola-Jones vs R-FCNResults92% accuracy (R-FCN)FDDBresults40ms on GPU (slow)

Face detection: how fastWe need faster solution at the same accuracy! Target: <

Alternative: MTCNNCascade of 3 CNNResize to different scalesProposal -> candidates + b-boxesRefine ->

Comparison: MTCNN vs R-FCNMTCNN+ Faster+ Landmarks - Less accurate- No batch processing

What is TensorRTNVIDIA TensorRT is a high-performance deep learning inference optimizerFeaturesImproves performance for

TensorRT: layer optimizationsHorizontal fusionConcat elisionVertical layer fusion

TensorRT: downsidesCaffe + TensorFlow supportedFixed input/batch sizeBasic layers support

Batch processingProblemImage size is fixed, butMTCNN works at different scalesSolutionPyramid on a single

Batch processingResultsSingle runEnables batch processing

TensorRT: layersProblemNo PReLU layer => default pre-trained model can’t be usedRetrained with ReLU

Face detection: inference Target: < 10 msResult: 8.8 msIngredientsMTCNNBatch processingTensorRT

Face recognition taskGoal – to compare facesHow? To learn metricTo enable Zero-shot learning

Training set: MSCelebTop 100k celebrities10 Million images, 100 per personNoisy: constructed by leveraging

Small test dataset: LFWLabeled Faces in the Wild Home13k images from the web1680

Large test dataset: MegafaceIdentification under up to 1 million “distractors”530 people to find

Megaface leaderboard~83%~98%cleaned

Metric Learning

ClassificationTrain CNN to predict classesPray for good latent space

SoftmaxLearned features only separable but not discriminativeThe resulting features are not sufficiently effective

We need metric learningTightness of the clusterDiscriminative features

Triplet lossFeaturesIdentity -> single pointEnforces a margin between personspositive + α < negative

Choosing tripletsCrucial problemHow to choose triplets ? Useful triplets = hardest errorsSolutionHard-mining within

Choosing triplets: trap

Choosing triplets: trappositive ~ negative

Choosing triplets: trapInstead

Choosing triplets: trapSelecting hardest negative may lead to the collapse early in training

Choosing triplets: semi-hardpositive < negative < positive + α

Triplet loss: summaryOverviewRequires large batches, margin tuningSlow convergenceOpensource CodeOpenface (Torch)suboptimal implementationFacenet, not original

Center lossIdea: pull points to class centroids

Center loss: structureWithout classification loss – collapsesSoftmaxLossCenterLossFinal loss = Softmax loss + λ

Center Loss: different lambdasλ = 10-7

Center Loss: different lambdasλ = 10-6

Center Loss: different lambdasλ = 10-5

Center loss: summaryOverviewIntra-class compactness and inter-class separabilityGood performance at several other tasksOpensource CodeCaffe

Tricks: augmentationTest time augmentationFlip imageAverage embeddingsCompute 2 embeddings

Tricks: alignmentRotationKabsch algorithm - the optimal rotation matrix that minimizes the RMSD

Angular SoftmaxOn sphere Angle discriminates