What this is
- This study examines the validity of the Somfit wearable device for assessing sleep in athletes compared to ().
- Twenty-seven athletes participated, spending a night in a sleep lab while using both Somfit and simultaneously.
- The study categorizes sleep into five states and assesses the agreement between Somfit and data.
Essence
- Somfit shows moderate to substantial agreement with in categorizing sleep stages among athletes. Agreement percentages ranged from 63% to 79% depending on data quality.
Key takeaways
- Somfit achieved 79% agreement with for the excellent-capture subset, indicating substantial accuracy in sleep stage classification.
- Total sleep time was underestimated by Somfit by 10 minutes for the excellent-capture subset, which is clinically acceptable.
- Data quality significantly influences the accuracy of Somfit, with better performance observed in participants with over 99.9% data capture.
Caveats
- Data from one participant were excluded due to complete loss of Somfit data, raising concerns about device reliability.
- Variability in data capture among participants may affect the overall validity of Somfit for assessing sleep in athletes.
Definitions
- Polysomnography (PSG): A comprehensive recording of the biophysiological changes that occur during sleep, considered the gold standard for sleep assessment.
- Cohen's kappa: A statistical measure of inter-rater agreement for qualitative items, used to assess the agreement between Somfit and PSG.
AI simplified
1. Introduction
Polysomnography (PSG) is the gold standard technique for the assessment of human sleep [1]. At a minimum, PSG requires electrodes to be attached to the scalp, face, and body to generate signals for electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG). A trained sleep technologist then manually scores each 30 s segment (epoch) of a sleep period as either wake, light sleep (stage 1 non-REM sleep [N1] or stage 2 non-REM sleep [N2]), deep sleep (stage 3 non-REM sleep [N3]), or rapid eye movement (REM) sleep based on characteristic patterns of brain activity in the EEG, eye movements in the EEG, and muscle tone in the EMG. Measures regarding the quantity and quality of sleep can then be derived from the scored PSG records (total sleep time, amount of deep sleep, etc.).
Various wearable devices, i.e., sleep wearables, have been developed as alternatives to PSG for the assessment of sleep because the collection of PSG data can be complex, inconvenient, time-consuming, and costly. There are two main types of sleep wearables. Activity-based wearables are worn on the body, typically on the wrist or finger, and base sleep assessments on measures of physical activity alone (in the case of research-grade wearables), or a combination of physical activity and cardiac activity (in the case of commercial-grade wearables). Neurophysiological-based wearables are worn on the head, typically on the forehead, and base sleep assessments on measures of brain activity, eye movements and muscle toneâsimilar to the measures used in PSG, but using a smaller number of electrodes that can be self-applied.
In preparation for a previous validation study [2], high-performance sport staff identified the top six sleep wearables that were being used or were likely to be used in the near future by elite athletes. The devices that were identified included five commercial-grade activity-based wearables, from Apple, Garmin, Polar, Oura and Whoop, and one neurophysiological-based wearable, from Compumedics. The validity of sleep wearables from these six providers was examined in numerous separate studies with non-athlete participants. The results of these studies indicate that the neurophysiological-based wearable, i.e., Somfit by Compumedics, has a higher level of agreement with PSG than the activity-based wearables, e.g., [3,4,5,6]. Similarly, when devices from all six providers were worn simultaneously by fifty physically active participants during a single night of sleep, Somfit had a higher level of agreement with PSG than the activity-based wearables [2]. These outcomes are probably to be expected as the measures used by Somfit to assess sleep are similar to those used by PSG.
There is some evidence to indicate that athletes have poorer sleep quality than non-athletes [7,8], and it is well-established that sleep wearables are less accurate when used in populations with poor-quality sleep compared to others [9,10,11]. Therefore, when assessing the validity of sleep wearables that are likely to be used by athletes, it is important to include them as participants in the research. A limited number of studies have assessed the validity of activity-based wearables to assess sleep with athletes wearing research-grade devices [12,13] and commercial-grade devices [14,15], but the validity of neurophysiological-based sleep wearables has not been tested specifically in athletes. In response to this gap, the objective of the current study was to examine the validity of a neurophysiological-based wearable, i.e., Somfit, for assessing athletesâ sleep by comparing its outputs with PSG.
2. Materials and Methods
2.1. Participants
Twenty-seven participants completed the study [14 female; 13 male; age = 22.3 ± 5.1 years (mean ± SD)]. To be included, participants had to be (a) well-trained athletes (n = 4), i.e., participating in at least 2 h of training per day, at least 3 days per week, for at least 3 years [16], or (b) highly trained athletes (n = 23), i.e., participating in at least 3 h of training per day, at least 5 days per week, for at least 5 years [16]. Participants were recruited from the sports of soccer (n = 10), beach volleyball (n = 6), athletics (n = 3), basketball (n = 3), diving (n = 2), orienteering (n = 1), track cycling (n = 1), and weightlifting (n = 1). Wearable devices may be used to assess athletesâ sleep at any phase of the yearly competition cycle, so the sample in this study included participants in all three phases of the cycle, i.e., pre-season (n = 10), in-season (n = 14), and off-season (n = 3). Participants provided written informed consent and received a nominal honorarium for their involvement (in the form of a gift voucher). This study was approved by the CQUniversity Human Research Ethics Committee following the guidelines of the National Health and Medical Research Council (Australia).
2.2. Equipment
2.2.1. Polysomnography (Gold Standard)
The gold standard assessment of sleep was conducted using PSG. In this study, a standard montage of Grass gold-cup electrodes (AstroMed, West Warwick, RI, USA) was attached to each participantâs scalp, face, and body. The montage included four channels of electroencephalography (EEG) to assess brain activity (C4âM1, C3âM2, F4âM1, O2âM1), two electrooculograms (EOG) to assess eye movements (left/right outer canthus), and three electromyograms (EMG) to assess muscle tone (submental). PSG data were transmitted via ethernet and recorded to a Grael data acquisition, storage, and analysis system (Compumedics Ltd., Melbourne, Australia).
2.2.2. Somfit (Wearable Device)
Somfit (Compumedics Ltd., Melbourne, Australia) is a wearable device that attaches to the forehead through a self-adhesive, disposable patch. The device records various data, including two EEG signals, pulse rate, oxygen saturation, motion, ambient light, skin temperature, ambient temperature, etc. The data are transmitted via Bluetooth to a mobile phone paired to the Somfit device and then transmitted via Wi-Fi to a cloud-based data management and reporting system (Profusion Nexus360, version 2.0.0.34).
2.3. Procedure
The study was conducted at The Sleep Lab at CQUniversityâs Appleton Institute for Behavioural Science in Wayville, South Australia. The Sleep Lab comprises two accommodation suites fitted out as serviced apartments. In total, the suites contain six bedrooms, six bathrooms, two kitchens, two dining rooms, a gymnasium, and laundry facilities. The accommodation suites are sound-attenuated, windowless, and temperature-controlled, such that, during time in bed, background noise was 30 dB, bedrooms were completely dark, and the target ambient temperature was 21â23 °C.
Each participant attended The Sleep Lab on a single night in groups of 4â6 people. Participants arrived after dinner in the evening (~20:00), were fitted with the PSG and Somfit equipment from 21:30 to 22:30, were given a 9 h sleep opportunity in bed with lights out from 23:00 to 08:00, and departed at ~08:30 after the sleep monitoring equipment was removed. To ensure the timing of the PSG and Somfit equipment was synchronised, all devices were set to the same time prior to each night of use.
2.4. Measures
2.4.1. Epoch ScoringâPolysomnography (Gold Standard)
PSG records, from lights off to lights on, were independently scored by three trained, experienced sleep technologists. The records were scored according to version 2.6 of the AASM Recommended Rules [17] using Profusion PSG 3 (version 3.4.404) or Profusion PSG 5 (version 5.1.656) software (Compumedics Ltd., Melbourne, Australia). A five-state hypnogram was generated for each 9 h PSG record by each scorer, with all 1080 Ă 30-s epochs classified as one of five states (i.e., wake, N1, N2, N3, REM). The five-state hypnograms from the three scorers were used to create a five-state concordance hypnogram. If two or three of the scorers assigned the same sleep stage to a particular 30-s epoch, the corresponding epoch in the five-state concordance hypnogram was classified as that stage. If the three scorers assigned three different sleep stages to a particular 30-s epoch, the corresponding epoch in the five-state concordance hypnogram was classified as âdiscordantâ. The five-state hypnograms from the three scorers were also converted into two-state hypnograms, with all 1080 Ă 30-s epochs classified as one of two states (i.e., wake or sleep). The two-state hypnograms from the three scorers were used to create a two-state concordance hypnogram (employing a similar process as for the five-state concordance hypnogram).
2.4.2. Epoch ScoringâSomfit (Wearable Device)
Somfit records, from lights off to lights on, were automatically scored independently of the PSG records using the Profusion Nexus360 cloud-based data management and reporting system. A five-state hypnogram was generated for each 9 h Somfit record, with all 1080 Ă 30-s epochs classified as one of five states (i.e., wake, N1, N2, N3, or REM). The five-state hypnograms were also converted into two-state hypnograms, with all 1080 Ă 30-s epochs classified as one of two states (i.e., wake or sleep).
2.4.3. Summary Sleep VariablesâPolysomnography and Somfit
For each participantâs night of sleep, the PSG five-state concordance hypnogram and the Somfit five-state hypnogram were used to determine two separate sets of variables regarding the amount of time spent in any stage of sleep (total sleep time) and the amount of wake, N1, N2, N3, and REM for each 9 h period in bed. N1 and N2 are lighter stages of sleep, and N3âsometimes referred to as slow-wave sleepâis a deeper stage of sleep [1]. REM sleep is sometimes referred to as dreaming sleep, but dreaming also occurs in non-REM sleep, albeit about half as often compared to REM sleep [18].
2.5. Data Analysis
An analysis of the data was conducted using Microsoft Excel for Mac version 16.74 and IBM SPSS Statistics version 27.0.1.0. The analysis was consistent with a standardised procedure for assessing the performance of sleep wearables [19].
2.5.1. Epoch-by-Epoch Comparisons
Epoch-by-epoch comparisons of Somfit with PSG were performed for two-state categorisation of time in bed (as sleep or wake) and five-state categorisation of time in bed (as a particular sleep stage or wake). For both sets of comparisons, a 30-s epoch of time in bed was excluded from the analyses if (a) the PSG epoch was classified as âdiscordantâ in the respective concordance hypnogram (i.e., no agreement between the scorers), or (b) the Somfit epoch was not captured/scored.
For the two-state categorisation of sleep, each 30-s epoch was classified as one of four types based on the agreement (or not) of Somfit with PSG (Table 1), and the following variables were calculated:Sensitivity for sleep (%) = TS/(TS + FW) Ă 100 (i.e., the percentage of PSG sleep epochs correctly scored as sleep by Somfit).Sensitivity for wake (%) = TW/(TW + FS) Ă 100 (i.e., the percentage of PSG wake epochs correctly scored as wake by Somfit [sometimes referred to as specificity]).Agreement (%) = (TS + TW)/(TS + TW + FS + FW) Ă 100 (i.e., the percentage of all PSG epochs correctly scored as sleep or wake by Somfit).
For the five-state categorisation of sleep, each 30-s epoch was classified as one of twenty-five types based on the agreement (or not) of Somfit with PSG (Table 2), and the following variables were calculated:Sensitivity for N1 (%) = TN1/(TN1 + FWN1 + FN2N1 + FN3N1+ FRN1) Ă 100(i.e., the percentage of PSG N1 epochs correctly scored as N1 by Somfit).
The agreement statistic was also calculated as a measure of inter-rater agreement for the five-state hypnograms produced by the three trained sleep technologists (A, B, C), i.e., A v. B agreement, A v. C agreement, and B v. C agreement.
Sensitivity and agreement indicate the likelihood that a PSG epoch will be correctly identified by Somfit. Cohenâs kappa (Îș) was also calculated to evaluate the agreement values relative to that which could be expected due to chance [20]. The kappa statistic was interpreted using standard guidelines: 0 to 0.20 = slight agreement; >0.20 to 0.40 = fair agreement; >0.40 to 0.60 = moderate agreement; >0.60 to 0.80 = substantial agreement; >0.80 to <1.00 = almost perfect agreement; and 1.00 = perfect agreement.
Due to the vagaries associated with aligning the timing of different types of sleep-recording equipment, synchronisation may be imperfect, so there may be minor differences in clock time between some pairs of Somfit v. PSG sleep records. To minimise the effect of these differences, the five-state agreement was examined with offset adjustments of 1â10 Ă 30-s epochs in both directions at the individual level and the offset with the highest agreement value was applied. For the 26 Somfit records, one required no offset, and 25 required an offset; the maximum offset applied was 10 Ă 30-s epochs; the average absolute offset for the 25 affected records was 4.9 Ă 30-s epochs; and the mean multi-state agreement for all 26 records was 3.6% higher when offsets were applied.
2.5.2. Summary Sleep Variables (BlandâAltman Analyses, Means Comparisons)
Agreement between Somfit and PSG for total sleep time was examined using the limits of the agreement method [21]. Modified BlandâAltman plots were produced to display (a) the pairwise differences between Somfit- and PSG-derived values for total sleep time, (b) the mean difference between the Somfit- and PSG-derived values for total sleep time (bias), and (c) the 95% limits of agreement, i.e., bias ± (1.96 Ă SD). The plots were examined for proportional bias and heteroscedasticity using ordinary least squares regression and the BreuschâPagan test, respectively. In cases where heteroscedasticity or proportional bias were present, this was noted but the bias and 95% limits of agreement were not adjusted in the plots.
For all summary sleep variables, Somfit-derived values were compared with PSG-derived values using mean differences (bias), mean absolute differences (absolute bias), paired-sample t-tests and effect sizes (based on Cohenâs d). For each variable, the Somfit means were considered to differ from the PSG means if p < .05 for the t-test and/or the 95% confidence interval for Cohenâs d did not include zero. Furthermore, for total sleep time, wearable-based estimates are typically considered to be clinically satisfactory if the absolute bias is <30 min [22].
3. Results
3.1. Data Capture and Data Quality
Twenty-seven participants completed the study. All data from one participant were excluded because no Somfit data were recorded. It could not be determined whether this instance of complete data loss was due to human error (by researchers or the participant) or failure of Somfit hardware, software, firmware, or systems.
For the 26 participants for whom Somfit data were available, there were large differences between participants in (a) the number of 30-s epochs for which data were captured in the Somfit records and (b) the number of captured 30-s epochs for which data were of sufficient quality to be scored (as wake or a particular stage of sleep) by Somfitâs auto-scoring algorithms. Preliminary analyses indicated these factors were likely to affect the quality of the Somfit outputs, so for all subsequent analyses, three subsets of data were created: Unfiltered subset (n = 26): contains Somfit records (and their matching PSG records) from all participants. Good-capture subset (n = 15): contains Somfit records (and their matching PSG records) from participants for whom > 80% of the 30-s epochs were captured/scored by Compumedicsâ Profusion Nexus360 system. Excellent-capture subset (n = 7): contains Somfit records (and their matching PSG records) from participants for whom > 99.9% of the 30-s epochs were captured/scored by Compumedicsâ Profusion Nexus360 system.
3.2. Epoch-by-Epoch Comparisons
For the two-state categorisation of sleep/wake, the percent agreement and Cohenâs kappa were 84% and 0.45 for the unfiltered subset, 89% and 0.51 for the good-capture subset, and 94% and 0.64 for the excellent-capture subset (Table 3). In comparison, agreement for the two-state categorisation between pairs of sleep technologists scoring the PSG records was 93â96% for the unfiltered subset, 94â97% for the good-capture subset, and 94â97% for the excellent-capture subset (Table 3).
For the five-state categorisation of sleep/wake, percent agreement and Cohenâs kappa were 63% and 0.47 for the unfiltered subset, 66% and 0.52 for the good-capture subset, and 79% and 0.70 for the excellent-capture subset (Table 3). In comparison, agreement for five-state categorisation between pairs of sleep technologists scoring the PSG records was 77â87% for the unfiltered subset, 78â87% for the good-capture subset, and 78â87% for the excellent-capture subset (Table 3).
Error matrices for the five-state categorisation indicate that wake, N1, and REM were the most difficult states for Somfit to identify (Table 4, Table 5 and Table 6). When Somfit worked most effectively (see the excellent-capture subset, Table 6), its main sources of error were classifying wake as REM, classifying N1 as N2 or wake, classifying N2 as N3, classifying N3 as N2, and classifying REM as N2.
3.3. Summary Variables (BlandâAltman Analyses, Bias, Absolute Bias)
For the unfiltered subset, Somfit underestimated total sleep time (TST) by 144 min with an absolute bias of 152 min (Table 7, Figure 1A). For the good-capture subset, Somfit underestimated TST by 25 min with an absolute bias of 40 min (Table 8, Figure 1B). For the excellent-capture subset, Somfit overestimated TST by 10 min with an absolute bias of 14 min (Table 9, Figure 1C).
The modified BlandâAltman plots (Figure 1AâC) are presented in the standard fashion, with horizontal lines for mean bias and 95% confidence intervals. Regression analyses indicated that the three subsets did not have proportional bias, i.e., unfiltered subsetâR2 = .006, df = 24, p = .70; good-capture subsetâR2 = .046, df = 13, p = .44; and excellent-capture subsetâR2 = .130, df = 5, p = .43. BreuschâPagan tests indicated the unfiltered subset had heteroscedasticity, but the good-capture and excellent-capture subsets did not have heteroscedasticity, i.e., unfiltered subsetâBP = 5.1, df = 1, p = .02; good-capture subsetâBP = 0.6, df = 1, p = .45; and excellent-capture subsetâBP = 2.6, df = 1, p = .10.
4. Discussion
4.1. Comparison of Somfit for Assessing the Sleep of Athletes and Non-Athletes
The results obtained in this validation study of Somfit with athletes are similar to those obtained in a previous validation study with non-athletes [2]. In the previous study, records were only included if >80% of the 30-s epochs had scoreable Somfit dataâequivalent to the threshold used to determine the good-capture subset in the current study. In that study, Somfit correctly identified 65% of all PSG epochs for the five-state categorisation of sleep/wake, with a kappa value of 0.52, which indicates a moderate level of agreement. In comparison, for the Good-capture subset in the current study, Somfit correctly identified 66% of all PSG epochs, with a kappa value of 0.52. Together, these two sets of results indicate that Somfit is as good at estimating sleep staging during a full night of sleep for athletes as it is for non-athletes.
4.2. Validity of Somfit for Assessing Athletesâ Sleep
The degree to which Somfit can be considered valid for the assessment of sleep in athletes depends on which subset of the current data is given the greatest weighting. Consider the critical outcomes for each of the subsets of data. First, agreement between Somfit and PSG for the five-state categorisation of sleep/wake was 63% for the unfiltered subset, 66% for the good-capture subset, and 79% for the excellent-capture subset. Second, compared with PSG, Somfit had a mean bias in TST, such that it was underestimated by 144 min for the unfiltered subset and 25 min for the good-capture subset, and overestimated by 10 min for the excellent-capture subset. Third, compared with PSG, Somfit had an absolute bias in TST such that it differed by 152 min for the unfiltered subset, 40 min for the good-capture subset, and 14 min for the excellent-capture subset. Finally, compared with the PSG-derived values for the summary sleep variables, based on means comparisons and effect size analyses, the Somfit-derived values differed for 6 of the 6 variables for the unfiltered subset, 1 of the 6 variables for the good-capture subset, and 0 of the 6 variables for the excellent-capture subset. If Somfit performed at a similar level with athletes outside the current study as it did for the unfiltered subset, it would not be considered a valid measure of sleep. In contrast, if Somfit performed at a similar level with athletes outside the current study as it did for the excellent-capture subset, then it would be considered a valid measure of sleep. Indeed, the five-state agreement between Somfit and PSG for the excellent-capture subset (79%) was similar to the inter-rater agreement between one pair of the trained sleep technicians scoring the gold standard PSG (78%). In addition, for the excellent-capture subset, the absolute bias in the estimate of TST by Somfit in comparison to PSG (14 min) was well under the threshold of 30 min and considered clinically satisfactory [22]. Furthermore, in a recent laboratory-based validation, albeit with a clinical sample rather than a sample of athletes, the five-state agreement between Somfit and PSG was 76% [5], which indicates that Somfit can perform at a similar level in other settings as it did for the excellent-capture subset in the current study.
4.3. Maximise the Likelihood of Capturing Excellent Somfit Data
If Somfit is used to assess sleep in athletes (and others), efforts should be made to ensure the quantity and quality of the data captured more closely resemble this studyâs excellent-capture subset, rather than the unfiltered subset. Based on the authorsâ experiences using Somfit in laboratory- and field-based settings, there are two major modifiable factors that influence the likelihood of maximising the quantity and quality of captured data. First, the quality of the signals received from the adhesive forehead patch that contains the Somfit electrodes seems to be highly dependent on the preparation of the forehead before the adhesive patch is applied. If users are given the instruction to âscrub the foreheadâ with an alcohol wipe prior to applying the adhesive patch, rather than to âclean the foreheadâ (as per the manufacturerâs user guides), the connection between the Somfit electrode and the forehead is superior, and a greater amount of high-quality data are likely to be collected. Therefore, it is recommended that users are advised to âscrub the forehead with an alcohol wipe as hard as possible without causing painâthen pause and repeatâ. Scrubbing, rather than cleaning, more closely mimics the process that sleep technologists use when preparing skin prior to attaching PSG electrodes. Second, the quantity of the data captured by Somfit is affected by the degree to which the Bluetooth connection is maintained throughout a sleep period. Data are not captured during any time when the Somfit device adhered to the forehead is not connected to its paired mobile phone. To minimise loss of Bluetooth connection, and any associated loss of data, users should (a) position the mobile phone such that it has a direct line of sight with the Somfit device on the forehead, and (b) take the mobile phone with them if leaving the bedroom during a sleep period (e.g., for a bathroom visit).
4.4. Types of Use of Somfit to Assess Athletesâ Sleep
There are two main circumstances in which it may be useful to assess an athleteâs sleep. First, the daily monitoring of sleep/wake behaviourâthis typically involves tracking basic constructs such as the timing of sleep, total sleep time and sleep quality for weeks, months, or years to monitor for potential changes caused by training load, competition, travel, illness, injury, etc. Second, a one-off examination of sleep structureâthis is typically undertaken if an athlete is feeling fatigued for no apparent reason, having difficulty obtaining a reasonable amount of good-quality sleep, or waking up tired after a full night of sleep. Given the anecdotal reports from some users regarding the potential discomfort associated with wearing Somfitâs adhesive forehead patch, particularly if it is used for successive sleeps, it is possible that Somfit may not be tolerated by some athletes for the daily monitoring of sleep. However, in situations where it is necessary to obtain data regarding the structure of an athleteâs sleep (i.e., amount of wake, light sleep, deep sleep, and REM within a sleep period) over a limited number of nights, then Somfit could be used if PSG is impractical. If Somfit is used with athletes as an alternative to PSG, steps should be taken to maximise the quantity and quality of data that are captured (see). In future, if a forehead patch is developed that provides sufficient adhesion but has a lower likelihood of discomfort for the wearer, then Somfit may become a more viable option for the daily monitoring of athletesâ sleep. Section 4.3
4.5. Use of Somfit to Capture Metrics Related to the Circadian System
Any wearable device, including Somfit, that can capture valid sleep data may also be used to derive important measures regarding the strength of the circadian system, i.e., the internal body clock. For example, two such measures are sleep consistency, i.e., the day-to-day variability in the start/end times of sleep, and social jet lag, i.e., the difference in the timing of sleep on work days and free days. In non-athletes, low sleep consistency is associated with poor mental health [23], impaired cognitive function [24], and increased risk of mortality [25]; and high social jet lag is associated with depression [26], obesity [27], and poor academic performance [28]. In future, it may be possible to use data obtained from Somfit, or other sleep wearables, to examine the potential effects of circadian disruption on important outcomes for athletes, such as mental well-being, physical performance, and risk of illness or injury.