We benchmark three facial analysis toolkits—AFFDEX 2.0, OpenFace 2.0, and LibreFace — on a large-scale, in-the-wild corpus of 7,805 videos (∼10.5M frames) spanning diverse demographic groups. While face-detection coverage is comparable between AFFDEX 2.0 and OpenFace 2.0 (both near 95%), LibreFace detects faces in fewer frames (83%).
For 13 Action Units (AUs) tested, AFFDEX 2.0 achieves higher average balanced accuracy (by approximately 8–13 percentage points) and higher average ROC-AUC (by approximately 19–23 percentage points) than the open-source baselines, indicating stronger robustness under noisy, real-world conditions.
