Overview
A real-time computer vision application that monitors your posture and phone usage to help break the doom scrolling habit. Uses MediaPipe for face mesh (468 landmarks) and hand tracking (21 joints), plus optional YOLOv8 for phone detection. Features personal calibration, smoothed tracking with exponential moving average, and escalating audio/visual warnings.
The Problem
Doom scrolling is a modern problem - we mindlessly check our phones without realising how much time passes. Existing solutions rely on app timers, but those only work within apps. I wanted something that catches the physical behaviour regardless of what app you're using.
The Approach
Rather than complex 3D pose estimation (which is jittery), I track nose Y position relative to a calibrated neutral. When your nose drops below threshold, you're looking down. Phone detection uses YOLO as primary (high confidence, triggers alone) and hand grip detection as secondary (only triggers if also looking down, reducing false positives). Hysteresis with frame buffers prevents flickering.
Outcome
Working detector that runs at ~30 FPS on modern hardware. Escalating warnings from gentle chime at 10 seconds to flashing red screen at 40 seconds. Learned practical lessons about sensor fusion, signal smoothing, and why simpler approaches often work better than complex ones.
