What this is
- The () assesses sleep quality, quantity, and phase.
- This study aimed to validate the using a larger sample and updated methods.
- A web-based survey included 2605 day-shift workers in Tokyo, Japan.
- The findings confirm the scale's reliability and validity for sleep assessment.
Essence
- The demonstrates sufficient reliability and validity for assessing sleep phase, quality, and quantity in day-shift workers. Its exploratory factor analysis confirmed a three-factor structure, and the scale's performance was validated against existing indicators.
Key takeaways
- The confirmed a three-factor structure for sleep phase, quality, and quantity. This structure was validated using exploratory factor analysis, enhancing the scale's applicability in sleep research.
- The () for sleep phase, quality, and quantity was 0.747, 0.812, and 0.819, respectively. These values indicate the scale's effectiveness in distinguishing between different sleep conditions.
- Reliability coefficients for all subscales were above 0.7, indicating strong reliability. High reproducibility was also confirmed, with most items showing strong correlations exceeding 0.75.
Caveats
- The study's sample was limited to day-shift workers in Tokyo, which may affect generalizability. Future research should include diverse populations to confirm the findings.
- The study's reliance on self-reported data may introduce bias. Objective measures of sleep could strengthen the validation of the .
Definitions
- 3-Dimensional Sleep Scale (3DSS): A scale designed to assess sleep quality, quantity, and phase through a 15-item questionnaire.
- Area Under the Curve (AUC): A statistical measure used to evaluate the performance of a diagnostic test, indicating its ability to distinguish between conditions.
AI simplified
Introduction
The various sleep scales used in sleep research have unique characteristics. For example, the Pittsburgh Sleep Questionnaire (PSQI) includes sleep quality issues such as difficulty falling asleep, mid-wake, and sleep efficiency, as well as quantity issues such as sleep duration and daytime sleepiness [1 β3]. The PSQI includes questions on snoring, pain, and sleeping pills, it is often used in sleep studies targeting patients with certain illnesses and older individuals [6 β8]. In contrast, the Athens Insomnia Scale (AIS) contains only simple questions and can be evaluated with a crude score [9, 10]. Nevertheless, both scales cannot assess issues related to sleep phases, such as social jetlag. Social jetlag, which refers to the mismatch between social time and biological time [11], is associated with a variety of physical and mental health problems [12 β16].
In 2014, we developed the 3-Dimensional Sleep Scale (3DSS) for Japanese day-shift workers [17, 18]. The 3DSS is a novel scale that consists of only 15 items and allows for simple evaluation not only of sleep quality and sleep quantity, as assessed by conventional sleep scales, but also of sleep phase. When developing this scale, input from sleep specialists, psychiatrists, industrial physicians, and clinical psychologists was used to establish its content validity and verify its reliability. However, the original study had limitations. First, the sample size was small. Second, although the principal factor method used for exploratory factor analysis is suitable for small sample sizes, it has limitations such as lower accuracy in estimating factor loadings and the inability to test model fit. Third, the concept of social jetlag had not yet become widely recognized at the time, and there were no appropriate indicators available to examine the convergent validity of the sleep phase score. Therefore, this study aimed to update the analytical methods and retest the reliability and validity of the 3DSS using a larger sample size.
Methods
Study design, participants, and ethical considerations
This study used an online questionnaire administered from October 2021 to December 2022. The participants were 3609 day-shift workers employed by four companies in Tokyo who cooperated with the survey. Because the population of Tokyo is approximately 10 million, we considered a sample size of at least 1067 persons sufficient, assuming a 95% confidence level, a 3% margin of error, and a population ratio of 0.5. All workers at the four companies in this study worked a day shift.
The purpose and procedures of the study were explained to the employees. Participation was voluntary, and we explained that no disadvantages would arise from not participating in the study and that the data obtained would be used exclusively for the study. First, a survey form was created online, and the staff in charge of each company sent its URL to all employees via e-mail with a request for responses. Consent for participation was obtained through a web-based interface and received electronically; those who disagreed were barred from accessing and responding to any research items. Identifying information was anonymized using a substitute employee ID number, and the participants were informed that their responses would be deleted if they chose to withdraw consent after answering the study.
Constitutive concepts and items of the 3DSS
The 3DSS was designed for use by Japanese day workers [17]. It comprises three categories (sleep phase, sleep quality, and sleep quantity), each with 5 items (for a total of 15 items). All items were discussed with sleep specialists, psychiatrists, occupational physicians, and clinical psychologists to ensure sufficient content validity. The constitutive concepts of each subscale and scoring methods are described in Supplementary material 1.
Response bias, construct validity, reliability, and reproducibility
The 15 questions of the 3DSS were tested for reliability and validity according to the COSMIN checklist, an international standard for scale development [19 β21]. We analyzed the ceiling effect, floor effect, skewness, and kurtosis to check for response bias. For construct validity, we performed exploratory factor analysis using the maximum likelihood method and promax rotation. Reliability was tested by calculating the intraclass correlation coefficient and McDonald's Ο reliability coefficient [22]. To verify reproducibility, participants were asked to respond again within 2 weeks to the 15 items of the 3DSS, and the intraclass correlation coefficient between the first and second responses was calculated.
Indicators for convergent and discriminant validity
We tested convergent and discriminant validity using a multitrait-multimethod matrix that correlated with existing indicators. The existing indicators used were holiday wake-up time, social jetlag, the AIS [4, 5], and weekday sleep duration.
Holiday wake-up times and social jetlag were used as phase indicators. The sleep phase subscale of the 3DSS includes constructs related to chronotype and sleep rhythm regularity. The holiday wake-up time is a good indicator of chronotype because it reflects an individual's true wake-up time, unaffected by social constraints, and social jetlag is obtained from the difference between the mid-sleep time on weekdays and mid-sleep time on weekends; the extent of social jetlag is larger for those whose sleeping and wake-up times change between weekdays and holidays [23 β25].
The sleep quality subscale of the 3DSS includes constructs related to nocturnal symptoms associated with insomnia disorders, such as difficulty falling asleep, awakening in the middle of the night, and early morning awakening. Okajima et al. reported that the AIS has a two-factor structure with Q1β5 and Q6β8 [5]. Of these, Q1β5 of the AIS are good indicators of sleep quality, including difficulty falling asleep and awakening during the middle of the night. We therefore used the total AIS Q1β5 score to assess sleep quality.
The sleep quantity on the 3DSS subscale includes not only sleep duration but also daytime symptoms such as daytime sleepiness, fatigue, and dozing (unintentional sleep) caused by lack of sleep. Q6β8 of the AIS are good indicators of sleep quantity as they focus on daytime symptoms. The total score of weekday sleep duration and Q6β8 of the AIS were therefore used as measures of quantity.
Hypothesis for convergent and discriminant validity
First, we hypothesized convergent validity; we expected the sleep phase score to be strongly correlated (β 0.8 β€ r β€ β 0.5) with the time of waking on holidays because the sleep phase score includes items directly related to waking time. In contrast, since the sleep phase score does not include items directly asking about social jetlag, we expected a moderate correlation (β 0.5 β€ r β€ β 0.3) between social jetlag and the sleep phase score. Quality 1β3 in the sleep quality score of the 3DSS is quite close to Q1β3 of the AIS because both are based on nocturnal insomnia symptoms. Therefore, we expected a strong correlation (β 0.8 β€ r β€ β 0.5). Quantity 3β5 included in the sleep quantity score assessed insomnia symptoms upon awakening, as did Q6β8 in the AIS. Therefore, we expected a strong correlation (β 0.8 β€ r β€ β 0.5). Since only one item, quantity 1, contained a question about sleep duration on weekdays, we expected a moderate correlation (0.3 β€ r β€ 0.5).
Next, hypotheses were set up for discriminant validity: the sleep phase scores were predicted to be largely uncorrelated (β 0.3 β€ r β€ 0.3) with AIS Q1β5, Q6β8, and weekday sleep duration. The sleep quality scores predicted little correlation (β 0.3 β€ r β€ 0.3) with holiday wake-up time, social jetlag, and weekday sleep duration. Q6β8 of the AIS contains items related to mood, which may have some correlation with nocturnal symptoms of insomnia. Therefore, we predicted that a moderate correlation (β 0.5 β€ r β€ β 0.3) between the sleep quality score of the 3DSS and Q6β8 of the AIS would be observed, which would be weaker than the correlation with Q1β5 of the AIS. The sleep quantity score was predicted to show minimal correlation (β 0.3 β€ r β€ 0.3) with holiday waking time and social jetlag. For AIS Q1β5, Q4 contained items related to sleep duration. Therefore, we predicted that the sleep quantity score of the 3DSS would have a moderate correlation (β 0.5 β€ r β€ β 0.3) with Q1β5 of the AIS and a weaker correlation with Q6β8 of the AIS.
Validation of cutoff values
We decided to use social jetlag and AIS, which were used during convergent validation, as indicators to validate the cutoff values. Social jetlag is the reported risk of health problems when there is more than 1 h [12, 13]. Therefore, we used "social jetlag β₯ 1 h" as the outcome when validating the cutoff value for the phase score. An "AIS score of 6 points or more", the threshold for a suspected sleep disorder, was set as the outcome when validating the cutoff values of the 3DSS sleep quality score and sleep quantity score. The area under the curve (AUC) was determined from the receiver operating characteristic (ROC) curve, and the maximum Youden's index score was searched to confirm whether there were any major discrepancies with the current cutoff values.
Statistical analysis
We employed the maximum likelihood method with Promax rotation for exploratory factor analysis. Factor loading cutoff values are often set between 0.3 and 0.4 [17, 26 β28]. In the present study, we used a large sample size and employed the maximum likelihood method, which provides more precise estimates of factor loadings; therefore, we set the cutoff value at 0.3. Spearman's correlation coefficient was used for the correlation analysis. The scale's reliability was assessed using the McDonald's Ο coefficient [22]. For the reliability value, 0.65 or higher was considered acceptable, based on criteria frequently used in previous research [29 β31]. All analyses were performed using IBM SPSS Version 28 for Windows. A two-sided p-value < 0.05 was considered statistically significant.
Results
Participant characteristics
Responses were obtained from 2651 participants. Among these, 35 did not consent to participate in the study, and 11 had been on leave within the past month, leaving 2605 (1718 men and 887 women; mean age 42.2 years, standard deviation 11.7) individuals included in the analysis. The valid response rate was 72.2%. The basic characteristics of the analyzed participants are presented in Supplementary material 2.
Response polarization analysis
| Phase 1 | Phase 2 | Phase 3 | Phase 4 | Phase 5 | Quality 1 | Quality 2 | Quality 3 | Quality 4 | Quality 5 | Quantity 1 | Quantity 2 | Quantity 3 | Quantity 4 | Quantity 5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | 1.88 | 1.61 | 1.67 | 1.33 | 1.49 | 1.97 | 2.02 | 2.32 | 1.58 | 2.1 | 1.52 | 1.37 | 1.59 | 1.67 | 2.1 |
| SD | 0.89 | 0.96 | 1.09 | 1.01 | 1.14 | 0.94 | 0.97 | 0.78 | 0.9 | 0.88 | 0.96 | 0.94 | 0.87 | 0.86 | 0.78 |
| Ceiling effect | 2.77 | 2.57 | 2.76 | 2.34 | 2.64 | 2.91 | 2.99 | 3.1 | 2.49 | 2.98 | 2.49 | 2.31 | 2.46 | 2.53 | 2.88 |
| Floor effect | 0.99 | 0.65 | 0.59 | 0.32 | 0.35 | 1.03 | 1.05 | 1.54 | 0.68 | 1.22 | 0.56 | 0.43 | 0.72 | 0.82 | 1.31 |
| Skewness | β 0.40 | β 0.12 | β 0.23 | 0.2 | 0.01 | β 0.56 | β 0.60 | β 0.86 | β 0.05 | β 0.67 | β 0.04 | 0.12 | β 0.08 | β 0.13 | β 0.53 |
| Kurtosis | β 0.59 | β 0.93 | β 1.24 | β 1.05 | β 1.41 | β 0.63 | β 0.75 | β 0.10 | β 0.79 | β 0.39 | β 0.95 | β 0.88 | β 0.67 | β 0.64 | β 0.26 |
Exploratory factor analysis

Scree plot of factors
| Items | Factor | ||
|---|---|---|---|
| 1st | 2nd | 3rd | |
| Quality 2 | 0.725 | 0,159 | β 0.133 |
| Quality 5 | 0.678 | β 0.077 | 0.085 |
| Quality 3 | 0.637 | 0.232 | β 0.040 |
| Quality 1 | 0.63 | β 0.202 | β 0.090 |
| Quality 4 | 0.577 | 0.025 | 0.311 |
| Phase 2 | 0.095 | 0.737 | β 0.049 |
| Phase 1 | 0.033 | 0.664 | β 0.011 |
| Phase 5 | β 0.056 | 0.587 | 0.231 |
| Phase 4 | β 0.109 | 0.571 | β 0.113 |
| Phase 3 | β 0.074 | 0.358 | 0.035 |
| Quantity 1 | β 0.273 | 0.19 | 0.806 |
| Quantity 2 | 0.09 | 0.045 | 0.771 |
| Quantity 3 | 0.277 | β 0.198 | 0.414 |
| Quantity 4 | 0.119 | β 0.150 | 0.407 |
| Quantity 5 | 0.077 | β 0.036 | 0.311 |
Reliability coefficients, item-total correlation analysis, and intraclass correlation coefficient
| Reliabilitycoefficient | Item | Itemβtotal correlation | Reliability coefficientwhen the item is excluded | Intraclass correlation coefficient for reproducibility(n = 402) | |
|---|---|---|---|---|---|
| Sleep phase | Ο = 0.690 | Phase 1 | 0.485 | 0.658 | 0.686 |
| Phase 2 | 0.557 | 0.629 | 0.828 | ||
| Phase 3 | 0.317 | 0.691 | 0.887 | ||
| Phase 4 | 0.53 | 0.6 | 0.891 | ||
| Phase 5 | 0.45 | 0.666 | 0.908 | ||
| Sleep quality | Ο = 0.786 | Quality 1 | 0.507 | 0.766 | 0.828 |
| Quality 2 | 0.567 | 0.758 | 0.836 | ||
| Quality 3 | 0.544 | 0.757 | 0.807 | ||
| Quality 4 | 0.597 | 0.73 | 0.816 | ||
| Quality 5 | 0.633 | 0.73 | 0.811 | ||
| Sleep quantity | Ο = 0.710 | Quantity 1 | 0.401 | 0.706 | 0.862 |
| Quantity 2 | 0.591 | 0.641 | 0.78 | ||
| Quantity 3 | 0.507 | 0.628 | 0.704 | ||
| Quantity 4 | 0.546 | 0.679 | 0.752 | ||
| Quantity 5 | 0.398 | 0.708 | 0.758 |
Hypothesis verification for convergent validity and discriminant validity
| Indicators | ||||||
|---|---|---|---|---|---|---|
| Sleep phase | Sleep quality | Sleep quantity | ||||
| Wake up time on day off | Social jetlag | AIS quality score | AIS quantity score | Sleep duration | ||
| 3DSS | Sleep phase | β 0.735*(β 0.8 βΌ β 0.5) | β 0.484*(β 0.5 β 0.3)βΌ | β 0.041*(β 0.30.3)βΌ | β 0.174*(β 0.30.3)βΌ | 0.003(β 0.30.3)βΌ |
| Sleep quality | 0.081*(β 0.30.3)βΌ | 0.04*(β 0.30.3)βΌ | β 0.753*(β 0.8 βΌ β 0.5) | β 0.340*(β 0.5 β 0.3)βΌ | β 0.021(β 0.30.3)βΌ | |
| Sleep quantity | β 0.147*(β 0.30.3)βΌ | β 0.130*(β 0.30.3)βΌ | β 0.472*(β 0.5 β 0.3)βΌ | β 0.675*(β 0.8 βΌ β 0.5) | 0.336*(0.30.5)βΌ | |
Confirmation of cutoff values

Receiver operating characteristic curve of each subscale scores
| Cutoff | Sensitivity | Specificity | Sensitivity + Specificity β 1 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sleep phase | Sleep quality | Sleep quantity | Sleep phase | Sleep quality | Sleep quantity | Sleep phase | Sleep quality | Sleep quantity | |
| 0.5 | 0.012 | 0.003 | 0.013 | 0.998 | 0.999 | 0.999 | 0.01 | 0.002 | 0.013 |
| 1.5 | 0.04 | 0.018 | 0.032 | 0.995 | 0.999 | 0.999 | 0.035 | 0.016 | 0.031 |
| 2.5 | 0.099 | 0.057 | 0.082 | 0.98 | 0.998 | 0.997 | 0.079 | 0.055 | 0.079 |
| 3.5 | 0.178 | 0.097 | 0.142 | 0.967 | 0.998 | 0.99 | 0.145 | 0.094 | 0.132 |
| 4.5 | 0.295 | 0.157 | 0.252 | 0.931 | 0.993 | 0.972 | 0.226 | 0.15 | 0.224 |
| 5.5 | 0.415 | 0.244 | 0.389 | 0.885 | 0.983 | 0.935 | 0.3 | 0.228 | 0.324 |
| 6.5 | 0.518 | 0.358 | 0.552 | 0.823 | 0.968 | 0.876 | 0.341 | 0.326 | 0.428 |
| 7.5 | 0.628 | 0.472 | 0.714 | 0.74 | 0.925 | 0.788 | 0.368* | 0.398 | 0.503* |
| 8.5 | 0.728 | 0.607 | 0.821 | 0.627 | 0.855 | 0.639 | 0.355β | 0.462 | 0.46β |
| 9.5 | 0.821 | 0.723 | 0.901 | 0.512 | 0.763 | 0.486 | 0.333 | 0.485* | 0.387 |
| 10.5 | 0.886 | 0.807 | 0.971 | 0.382 | 0.625 | 0.34 | 0.268 | 0.432β | 0.311 |
| 11.5 | 0.94 | 0.879 | 0.985 | 0.278 | 0.495 | 0.228 | 0.218 | 0.375 | 0.213 |
| 12.5 | 0.973 | 0.934 | 0.995 | 0.181 | 0.366 | 0.143 | 0.154 | 0.3 | 0.138 |
| 13.5 | 0.987 | 0.965 | 0.998 | 0.106 | 0.251 | 0.081 | 0.093 | 0.216 | 0.079 |
| 14.5 | 0.994 | 0.981 | 1 | 0.049 | 0.132 | 0.031 | 0.043 | 0.114 | 0.031 |
Discussion
In this study, the new methods (i.e., maximum likelihood method, McDonald's Ο reliability coefficient, and reproducibility) were used with a larger sample size than that in our previous study of 3DSS, which resulted reliability and validity in higher accuracy. Furthermore, using social jetlag and sleep duration as indicators for testing convergent and discriminant validity enhances the generalizability of our results.
Response bias and construct validity
In the response bias analysis of the 15 items, only Quality 3 had a ceiling effect. However, because it was slight and no kurtosis or skewness exceeded |1|, this result alone did not warrant exclusion. Regarding structural validity, exploratory factor analysis based on the maximum likelihood method indicated a three-factor structure with five items each, as expected. Quality 4, although factor loadings exceeding 0.3 were observed for both factors, was not excluded because the factor loadings for the first factor were clearly higher than those for the third factor. The decision not to exclude Quality 4 was also considered appropriate, considering the reliability coefficient when the item was excluded and the clinical importance of this item, which asks about sleep disturbance.
Reliability and reproducibility
Reliability coefficients above or close to 0.7 were observed for all subscales, indicating that the scales were sufficiently reliable. The results of the intraclass correlation coefficient, which examined reproducibility, showed that most items had strong correlations exceeding 0.75, indicating high reproducibility. The results of Phases 1 and Quantity 3, which were approximately 0.7, were not low enough to be ruled out as not reproducible, and the results of other analyses were taken into consideration.
Convergent validity and discriminant validity
Regarding convergent validity, all subscales were correlated with relevant existing indicators to the assumed degree and direction, indicating their validity. Regarding discriminant validity, all subscales showed no more correlations than expected, with existing indicators measuring different factors, indicating their validity. In particular, the sleep phase score showed almost no correlation with existing indicators of sleep quality and quantity, suggesting that it is completely discriminative and that it showed a correlation with social jetlag, providing novel insights into factors not measurable by other sleep scales, such as the PSQI and AIS. The AIS evaluates the total score of Q1β8 because it is used to evaluate insomnia disorders. The diagnostic criteria for insomnia disorders include daytime symptoms in addition to nighttime symptoms [32]. Q1β5 in the AIS are items related to nighttime symptoms, while Q6β8 are items related to daytime symptoms. Consequently, although they were extracted separately as factors, both are found in insomnia; therefore, some correlation is possible.
Cutoff values
The ROC curves, using social jetlag and AIS as outcomes, showed that the AUC area for all subscales exceeded 0.7. These results indicate that screening based on 3DSS is comparable to the accuracy in the case of screening based on social jetlag β₯ 1 h or AIS score β₯ 6. Regarding the cutoff value, the score that maximized Youden's index was one point lower than the current value, which itself was one point lower than the current sensitivity. Since Japan has a lower sleep condition compared to other countries [33], the current setting of "a cutoff value that captures many individuals (i.e., a cutoff value with high sensitivity)" would be acceptable. However, in situations where there is a shortage of personnel that limits the ability to respond to positive cases, or when the target population is expected to have a high level of health and a low pre-test probability, it may be more practical to lower the cutoff score by one point.
Limitations and future prospects
This study had some limitations. First, because all participants were workers, its reliability and validity cannot be guaranteed when applied to those who are not working. Second, the number of participants were not large in the study when verifying reproducibility. However, this number may be considered sufficient considering the practical challenge of conducting the same survey twice with busy workers. Third, there may be some degree of bias in the present study, as the sample included a slightly higher proportion of male participants and was drawn exclusively from Tokyo. Last, 3DSS created in Japanese for Japanese participants. In the future, it will be necessary to increase the proportion of female participants and include workers from regional cities in order to conduct confirmatory factor analysis under more diverse conditions. Furthermore, since sleep phase issues, as represented by social jetlag, are gaining attention not only in Japan but also in other countries, it is necessary to develop an English version of the 3DSS that can be easily adapted into various languages. In particular, as Japan is known to have shorter sleep duration compared to other countries, it will be necessary to rigorously re-evaluate the reliability and validity of the 3DSS when applying it in different cultural or national contexts.
Conclusions
In this study, the reliability and validity of the 3DSS developed in 2014 were tested using a new sample population and methodology. The results demonstrated adequate reliability, structural validity, convergent validity, and discriminant validity for use with day shift workers. The 3DSS is the world's first scale to evaluate sleep phase, quality, and quantity on a single scale; the development of an English version is warranted.
Supplementary Information
Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 21 kb)