A performance validation of six commercial wrist-worn wearable sleep-tracking devices for sleep stage scoring compared to polysomnography

Apr 30, 2025Sleep advances : a journal of the Sleep Research Society

How six popular wrist-worn sleep trackers compare to standard sleep tests in measuring sleep stages

AI simplified

Abstract

All tested wearable sleep-tracking devices achieved over 90% sensitivity in detecting sleep epochs.

  • Most wearables showed significant differences from polysomnography in total sleep time, sleep efficiency, wake after sleep onset, and light sleep.
  • All devices had lower specificity, ranging from 29.39% to 52.15%.
  • coefficients for the wearables indicated fair to moderate agreement with polysomnography, ranging from 0.21 to 0.53.
  • The Fitbit Sense, Fitbit Charge 5, and Apple Watch Series 8 showed higher agreement with polysomnography, with kappa values of 0.42, 0.41, and 0.53, respectively.
  • Improvements are needed for better multistate categorization in all devices.

AI simplified

Key numbers

>90%
Sensitivity for sleep detection
Percentage of correctly identified sleep epochs across all devices.
0.53
for Apple Watch Series 8
Highest kappa value among the tested wearables.
-5.74 minutes
Mean difference in TST for Fitbit Charge 5
Mean difference in total sleep time between Fitbit Charge 5 and PSG.

Full Text

What this is

  • This study evaluates the accuracy of six commercial wrist-worn wearable sleep-tracking devices against polysomnography (PSG), the gold standard for sleep measurement.
  • Sixty-two adults participated, using multiple wearables during a single night of PSG monitoring.
  • Findings show that while wearables can accurately identify sleep epochs, they often misclassify sleep stages and exhibit variability in performance.

Essence

  • Most wrist-worn wearables demonstrate high sensitivity (>90%) for detecting sleep but show significant variability in accuracy for specific sleep stages compared to PSG. The Fitbit Sense, Fitbit Charge 5, and Apple Watch Series 8 exhibit moderate to high agreement with PSG, making them more reliable for tracking sleep patterns.

Key takeaways

  • Wearables achieved >90% sensitivity in detecting sleep epochs, indicating their effectiveness in identifying when a person is asleep.
  • coefficients ranged from 0.21 to 0.53, suggesting fair to moderate agreement with PSG; Fitbit Sense (κ = 0.42), Fitbit Charge 5 (κ = 0.41), and Apple Watch Series 8 (κ = 0.53) performed better.
  • All wearables significantly overestimated total sleep time (TST) by 6.31–39.87 minutes, except for the Fitbit Charge 5, which underestimated TST by 5.74 minutes.

Caveats

  • The study's findings are limited to a single night of PSG, which may not reflect long-term accuracy in diverse sleep conditions.
  • Variability in performance across devices suggests that while some wearables perform well, others may not be reliable for detailed sleep analysis.
  • The sample was predominantly male and may not represent the full spectrum of sleep architecture across genders and ethnicities.

Definitions

  • Cohen's kappa: A statistical measure of inter-rater agreement for qualitative items, ranging from 0 (no agreement) to 1 (perfect agreement).

AI simplified

what lands in your inbox each week:

  • 📚7 fresh studies
  • 📝plain-language summaries
  • direct links to original studies
  • 🏅top journal indicators
  • 📅weekly delivery
  • 🧘‍♂️always free