What this is
- This research proposes a deep () framework for predicting using fetal heart rate (FHR) signals.
- The study aims to improve the accuracy of fetal monitoring by automating the interpretation of complex FHR data.
- By employing a data-driven approach, the model eliminates the need for manual feature extraction, enhancing diagnostic objectivity.
Essence
- The proposed model achieved an accuracy of 98.34% in classifying fetal states, significantly outperforming traditional methods. This system aims to provide obstetricians with a reliable tool for early detection of fetal distress.
Key takeaways
- The model demonstrated superior performance with an accuracy of 98.34%, sensitivity of 98.22%, and specificity of 94.87% across ten-fold cross-validation.
- Using () for signal preprocessing improved the model's ability to capture hidden characteristics of FHR signals, leading to better classification outcomes.
- The study indicates that the framework can be integrated into clinical practice to assist obstetricians in making timely medical decisions regarding fetal health.
Caveats
- The model requires a large and diverse dataset for effective training, which may not always be available in clinical settings.
- The computational intensity of the algorithm may pose challenges in real-time applications, necessitating further optimization.
Definitions
- Fetal acidemia: A condition characterized by an abnormal increase in acidity in the fetal blood, often due to hypoxia.
- Continuous wavelet transform (CWT): A mathematical technique used to analyze signals by decomposing them into time-frequency representations.
- Convolutional neural network (CNN): A type of deep learning model designed to automatically learn features from data, particularly effective in image processing.
AI simplified
Background
Fetal distress caused by hypoxia can lead to various abnormalities that can be divided into life-threatening and non-life-threatening events during the process of childbirth. Since the brain of a neonate is easily influenced by oxygen supply, a lack of oxygen can cause serious damage to the brain and even death [1]. Hence, to detect fetal acidemia early, we need a powerful technique that can monitor the fetal state in real time, and once an abnormal situation occurs, alert obstetricians to intervene in a timely manner before there is permanent damage to the fetus.
In clinical practice, cardiotocography (CTG) involves the continuous recording of the fetal heart rate (FHR) and uterine contraction (UC) signals and is routinely adopted by doctors to monitor and assess the fetal state during pregnancy and delivery [2, 3]. Unfortunately, due to the complexity of fetal physiological dynamics, which are regulated by neurological feedback loops, the visual analysis of FHR signals using common guidelines usually leads to high intra-observer and inter-observer disagreement among experts [4, 5]. In practice, obstetricians perform multiple subjective evaluations and thereby minimize diagnostic error. However, the main issue of the aforementioned process is the inability to be quantitatively realized, and obstetricians make decisions based on their individual experience [6, 7]. Consequently, the incidence rate of unnecessary cesarean sections (CSs) caused by subjective error is increasing and has become the main driving force in the search for a more objective analysis of the FHR signal [8].
In recent decades, to overcome the inherent defects of visual interpretation of FHR signals, many researchers have attempted to design reliable computer-aided diagnosis (CAD) systems consisting of automatic signal processing and evaluation [9]. Many advanced developments in the biomedical engineering field have been extensively used in FHR signals, such as frequency domain analysis [10], nonlinear features (entropy, complexity, etc.) arising from the domain of adult heart rate variability (HRV) analysis [11, 12], and others [13].
Furthermore, over the past several years, the existing CAD systems have been implemented with the application of machine learning (ML) algorithms to automatically classify pathological fetal events from normal events. Table 7 summarizes the related state-of-the-art work focusing on the above aim. Notably, earlier efforts on FHR-based CAD systems employed the conventional ML approaches and followed the same procedure: (i.) signal preprocessing (i.e., denoising), (ii.) feature extraction, (iii.) feature selection, and (iv.) final classification. These methods based on predictive learning classifiers mostly relied on complex hand-crafted features. For example, Czabanski et al. [14] designed an expert system to predict neonatal acidemia using a two-stage analysis based on weighted fuzzy scoring (WFS) and least square support vector machine (LS-SVM) and obtained performance with an accuracy (Acc) and quality index (QI) of 92.0 and 88.0%, respectively. Fanelli et al. [15] introduced a new nonlinear parameter based on the phase-rectified signal average (PRSA) for the quantitative assessment of fetal well-being and achieved an area under the curve (AUC) of 75% using the univariate analysis method. Comert et al. [16] applied an artificial neural network (ANN) and performed a classification with an Acc, sensitivity (Se), and specificity (Sp) of 92.40, 95.89 and 74.75%, respectively. Obviously, the feature engineering has dominated over conventional methods involving the difficult process of informative feature extraction and optimal feature selection, which is time-consuming, and may result in loss of physiological information regarding the fetus during the overall procedure.
Traditional ML methods usually exist the concept of the “black box“where even their designers cannot provide explanations/justifications explain why the artificial intelligence (AI) can accomplish the specific decision. Holzinger focused on the explainable AI, which made more re-traceable, explainable and reliable decisions [17].
In this study, we propose a deep convolutional neural network (CNN) framework aimed at FHR classification. Compared to the previously mentioned traditional methods, the CNN-based approach is completely data-driven and does not need to explicitly define the essential steps, namely, feature extraction and selection and classification [18]. Actually, these steps are all incorporated into the CNN model by means of self-learning informative features from the input data. CNNs have already yielded great achievements in image classification since they consider the spatial structure of the input data and avoid the curse of dimensionality [19]. Due to the attractive advantages, CNNs are extensively utilized in the medical field for the purpose of designing screening tools that automatically assist clinicians. For example, Acharya et al. designed the CNN structure to diagnosis coronary artery disease using an electrocardiogram (ECG) signal and achieved high accuracy of 95.11% [20]. In addition, Li et al. applied the 1-dimensional (1D) CNN to classify FHR signals and obtained the Acc of 93.24% [21]. Additionally, Comert et al. also proposed a novel approach to detect fetal hypoxia based on a deep CNN with transfer learning using the FHR signal and short term Fourier transform (STFT) [22].
In summary, automatic processing and further classification of FHR signals are indispensable components of CAD systems that satisfy the primary goal of this study, which is to facilitate the intense work of obstetricians and to assist them in making appropriate medical decisions to better protect the health of pregnant women and fetuses. The remainder of the paper is organized as follows: Section 2 introduces the database and gives a detailed description of the overall methodology of our proposed system; Section 3 depicts the corresponding experimental results and presents the discussion, including a comparative analysis with existing approaches; and Section 4 concludes the entire work and proposes directions for future work.

An overview of our proposed CAD system for intelligent prediction of fetal acidemia
Methods
Database description
In this study, the umbilical artery pH value measured after delivery, an objective biochemical marker, was selected as the gold standard to separate the fetal state into normal and pathological classes. And the pH threshold was set to 7.15 after careful consideration [26]. A pH below 7.15 was agreed as pathological and a pH greater than or equal to 7.15 was classified as normal; thus, the database contained 447 normal and 105 abnormal FHR recording.
| Information | Mean | Min | Max |
|---|---|---|---|
| Maternal age (MA, year) | 29.6 | 18 | 46 |
| Gestational age (GA, week) | 40 | 37 | 43 |
| pH | 7.23 | 6.85 | 7.47 |
| Base deficit in extracelluar fluid (BDecf, mmol/L) | 4.6 | −3.40 | 26.11 |
| pCO2 | 7.07 | 0.7 | 12.3 |
| Base excess (BE) | −6.38 | −26.80 | −0.20 |
| Apgar 1 min | 8.3 | 1 | 10 |
| Apgar 5 min | 9.1 | 4 | 10 |
| Gravidity | 1.4 | 1 | 11 |
| Parity | 0.4 | 0 | 7 |
| Diabetes | No = 515, Yes = 37 | ||
| Birth weight (BW, g) | 3401 | 1970 | 4750 |
| Infant sex | Male = 286, Female = 266 | ||
| Delivery type | Vaginal = 506, Cesarean section = 46 | ||
Signal preprocessing
Preprocessing is an indispensable step in most biomedical signal processing applications and affect not only the values of extracted features but also the final classification performance. In clinical practice, the FHR signal has two typical acquisition methods: the CTG signal recorded externally by Doppler ultrasound (US) probe placed on the abdomen of pregnant women and the fetal electrocardiogram (FECG) signal measured internally by an electrode attached to the fetal scalp [9]. From this point of view, the FHR signal might be “contaminated” by noise due to many factors, such as the movement of mother and fetus, displacement of the transducer and external clinical environment.

Signal preprocessing of No.1001 FHR recording (internal database number)
Continuous wavelet transform
Wavelet transform (WT), a widely used tool in the advanced field of signal processing, represents an effective method for multi-resolution analysis consisting of both time and frequency orientations [27]. As a type of WT, the CWT was originally proposed as an improved approach to address the issue of resolution [28]. The CWT has several outstanding merits, such as the reliable and flexible capacity to extract general and fine-grained feature information from the input signal; hence, this transform has been extensively employed in biomedical engineering to analyze non-stationary and nonlinear signals over the last decades [29]. The CWT is defined as the summation of the overall signal spectrum multiplied by the compressed and translated mother wavelet, which can be expressed mathematically by the following equations [23]: 1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ CWT\left(\tau, s\right)=\psi \left(\tau, s\right)={\int}_{-\infty}^{\infty }f(t){\varphi}_{\tau}^{\ast },f(t)\in {L}^2(R) $$\end{document}CWTτs=ψτs=∫−∞∞ftφτ∗,ft∈L2R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\varphi}_{\tau, s}(t)=\frac{1}{\sqrt{\mid s\mid }}\varphi \left(\frac{t-\tau }{s}\right),\tau, s\in R,s\ne 0 $$\end{document}φτ,st=1∣s∣φt−τs,τ,s∈R,s≠03\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\int}_{-\infty}^{\infty}\varphi (t) dt=0 $$\end{document}∫−∞∞φtdt=0where, f(t) is input signal, φ(t) is wavelet basis, and ψ(τ,s) is wavelet coefficient, which is a function of two variables, τ and s, accounting for the translation and scaling factors, respectively. The former determines the degree to which the wavelet is compressed or stretched, while the latter reflects temporal and spatial information and represents the translation diameter of time shifting.
Instead of using traditional morphological analysis (baseline estimation, detection of acceleration and deceleration pattern, etc.), the primary reason for applying the CWT in this work is that the CWT provides a better way to observe and capture the local hidden characteristic information of the FHR signal in both the time and frequency domains simultaneously. Although the heart rate contained in a preprocessed FHR signal may not be estimated or lost in the time domain during the image transformation, Warmerdam et al. [30] still demonstrated that the CWT allowed clinicians to assess the reliability of spectral analysis of FHR recordings that were contaminated by artifacts: the mapping of the signals into a time-scale space and better visible localization of the frequency components in the analyzed signals.
In addition, Comert et al. also proposed a prognostic model using CWT to obtain 2D time-frequency image and achieved better performance in classifying the fetal state than time domain analysis [31]. However, they employed the conventional ML method: feature extraction (image-based time-frequency features, IBTF), feature selection (genetic algorithm, GA) and final classification (LS-SVM). Obviously, this approach was much more complex and obtained unsatisfactory result with the Se and Sp of 63.45 and 65.88%, respectively.

The FHR signals (left) and corresponding time-frequency images (right) of the normal (top) and pathological (bottom) classes using the CWT with the mother wavelet of db2 and a wavelet scale of 24
Convolutional neural network
Serving as a typical type of DL, a CNN structure is composed of more hidden layers and neurons than the traditional multilayer perceptron (MLP) or ANN. Furthermore, the CNN algorithm is actually a type of supervised learning that can self-learn and self-organize based on the input data and corresponding output labels [18]. It eliminates the dependency on hand-crafted features and directly learns useful features from data. CNNs have already been successfully applied in many areas over the last decades, including face recognition, object localization, and image classification [19, 32, 33]. Due to the effectiveness of this approach, CNNs are extensively utilized in the medical field to design screening tools to assist clinicians [20–22].
The primary difference compared to traditional ML approaches is that a CNN can directly ignore the requirement for feature extraction and selection techniques. Hence, for most physiological signals, using CNNs can avoid the loss of valuable information and reduce the burden of computation in extracting and selecting the best features during the training process for accurate classification of pathological conditions. And a CNN significantly reduces the number of parameters that the neural networks need for training by means of receptive fields and weight sharing. The above attractive advantages were the main reasons why we chose a CNN for objective prediction of fetal acidemia.

The CNN architecture proposed in this work. Note: L = layer; FM = output feature map or number of neurons (width ×height ×depth)
Image input layer (layer 1)
In this paper, the continuous wavelet transform is used to convert the original 1D time series into a 2D image as the input layer of the CNN. Simultaneously, in order to avoid overfitting, we applied the data augmentation technique of the CNN architecture in the input layer. A random crop method was employed for image transformation, which enriches the image dataset and improve the generalization ability of the model.
Convolution layer (layer 2)
A CNN is a form of deep neural network (DNN) with special convolution structure, which can reduce the amount of memory occupied by the deep network and the number of parameters in the network. In the convolution layer, a feature map in which hidden layers are connected to each other is used to extract pixel-level abstracted image features via convolution operations of one or more convolution kernels (also referred to as a filter) [36]. Each convolution kernel applies a sliding window mechanism to traverse the entire feature map, and thereby gathers and fuses the information of each small area to complete the representation of a partial feature of the input image. In a CNN, the filter parameters used in each convolution layer are ordinarily consistent for two reasons: (i.) sharing allows the image content to be unaffected by location; and (ii.) this consistency can dramatically reduce the optimization parameters. The mechanism of parameter sharing is a very important and attractive property of the CNN algorithm.
Activation layer (layer 3)
The result of the convolution layer is then mapped through an activation function (AF) to form the feature mapping relationship. The AF is generally used between the layers of a neural network [37], which performs a mapping transformation of the input data and provides the nonlinear modeling capability of the network. During the process, element-by-element calculations do not change the size of the original data. In this CNN model, the rectified linear unit (ReLU) is selected due to the following advantages compared to other linear functions: (i.) faster convergence speed; and (ii.) only one threshold is required to obtain the activation value without having to complete complex computations.
Normalization layer (layer 4)
The batch normalization (BN) layer is to standardize the input data of each layer during the training process of the neural network, so that the gradient becomes larger, avoiding the problem of gradient disappearance and greatly accelerating the training speed [18].
Pooling layer (layer 5)
In general, the CNN model inserts a pooling layer (also named a sub-sampling layer) periodically between consecutive convolution layers [18]. Since the image features that are useful in one region may be equally applicable in another area, the pooling layer incorporates semantically similar features. The pooling operation reduces the eigenvectors of the convolution output and the number of parameters, so pooling can lower the model complexity and speed up the computation while preventing overfitting. Similar to the convolution layer, the pooling operation performs feature mapping for each sub-region on the input feature map in steps of stride. Max pooling, average pooling and randomized pooling are the most common pooling methods. The former operation calculates the maximum value of the image area as the pooled result, which is used for this CNN model.
Fully-connected layer (layer 6)
The fully-connected layer is located at the end of the network structure and is a traditional MLP network [38]. The final output of this network layer is high-level features of the input images, which are then statistically calculated according to a classifier, and the probability of the corresponding class label for the input image is also computed. After several rounds of convolution and pooling processing, the input image information can be assumed to have been abstracted into more information-intensive features. The convolution layer and pooling layer can be considered the necessary approaches to automatic image feature extraction. And when the feature transformation is completed, the fully-connected layer is used to execute the final classification task.
Dropout layer (layer 7)
For classification, we usually attempt to avoid the occurrence of the overfitting, where the trained model obtains high accuracy on the training data, yet the generalization error on the test data is relatively large. In other words, overfitting refers to certain situation in which a defined model can memorize the random noise in the training data but is not able to learn the general trend of the training data. Many factors can lead to overfitting and the following specific solutions are available and proposed in this work [39]:
(a.) Regularization: Regularization is a powerful approach to solve an ill-posed problem to prevent overfitting by introducing additional information. L2 regularization is applied to add a regularizer to the cost function for this work.
(b.) Dropout technique: The dropout layer is usually arranged after the fully-connected layer. During the training process, several neural units are temporarily dropped from the network with a certain probability.
Classification layer (layer 8)
Finally, the classification layer is used to separate output classes using softmax function, namely, normal and pathological.
| Layer | Type | Parameter/Method | Value/Approach |
|---|---|---|---|
| 1 | Image input layer | Data augmentation | Random crop |
| Data normalization | Zero center | ||
| 2 | Convolution layer | Stride | [] [1] |
| Padding | 0 | ||
| Learning rate of the weight | 1 | ||
| Learning rate of the bias | 1 | ||
| L2 regularization for the weight | 1 | ||
| L2 regularization for the bias | 1 | ||
| 3 | Activation layer | Method | ReLU |
| 4 | Normalization layer | Alpha | 1 × 10−3 |
| Beta | 0.75 | ||
| K | 2 | ||
| 5 | Pooling Layer | Method | Max pooling |
| Pool size | 2 × 2 | ||
| Stride | [] [2] | ||
| Padding | 0 | ||
| 6 | Fully-connected layer | Learning rate of the weight | 1 |
| Learning rate of the bias | 1 | ||
| L2 regularization for the weight | 1 | ||
| L2 regularization for the bias | 1 | ||
| 7 | Dropout layer | Probability | 0.5 |
| 8 | Classification layer | Softmax | Cross-entropy |
Performance evaluation
To evaluate performance, we adopted the Se, Sp, and Acc indicators, which were calculated from the common binary confusion matrix. In addition, an imbalanced dataset (the ratio of positive to negative was approximately 4:1 in this work) can negatively affect the overall performance of any classifiers; thus, a quality index (QI) defined as the geometric mean of the Se and Sp, and the area under the receiver operating characteristic (ROC) curve (AUC) were also applied to alleviate this issue.where TP, FP, FN and TN represent true positive, false positive, false negative, and true negative respectively. In this work, the normal fetal state (N) is considered positive, and the pathological fetal state (P) is negative. 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Acc=\frac{TP+ TN}{TP+ FP+ FN+ TN} $$\end{document} Acc = TP TN + TP FP FN TN + + + 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Se=\frac{TP}{TP+ FN} $$\end{document} Se = TP TP FN + 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Sp=\frac{TN}{FP+ TN} $$\end{document} Sp = TN FP TN + 7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ QI=\sqrt{Se\cdot Sp} $$\end{document} QI = Se Sp ⋅
Results
Experimental setup
The current work designed a novel CAD system that combined the CWT and 2D CNN to assess fetal state. The proposed methodology was implemented in MATLAB 2017a (Natick, MA USA) software, and the CNN model was trained on a PC workstation with two Intel Core 3.70 GHz (i3–4710) processors and 4 GB of RAM.
In this study, ten-fold cross-validation was applied in the performance evaluation to obtain more reliable results. The total images were randomly separated into 10 segments and 90% (2414 N and 567 P) formed the training set while the remainder (10%, 268 N and 63 P) was used to test the performance of our proposed system. The process was repeated 10 times and the final results were averaged.
| Parameter | Value/Approach | |
|---|---|---|
| Backpropagation algorithm | Stochastic gradient descent | |
| Momentum | 0.9 | |
| Initial learning rate | 0.01 | |
| Learning rate drop | Factor | 0.1 |
| Period | 10 epochs | |
| L2 regularizer factor | 1 × 10–4 | |
Experiment one: optimization of the CNN parameters
In CNN training, tuning the parameters is an indispensable step and plays an important role in optimization. After comprehensive experimentation, except the parameters defined in Tables 2 and 3, we discovered that the size of the convolution kernel, number of filters, maximum number of epochs, and size of the mini-batch could influence the classification performance. And the relevant experimental results can be summarized regarding the following two aspects.
(a.) The relationship between six indicators and Para2 was generally positive, regardless of Para1, indicating that the performance improved with an increase in Para2 with a cost in computation time;
(b.) No clear relationship was found between the measurements and Para1, but we still discovered that Para1 = 5 × 5 performed better and the corresponding training time was relatively shorter than that for the other settings.
Hence, after careful observation, Para1 was set to 5 × 5, and Para2 was set to 15 (indicated by the black arrow in Fig. 6), which were selected for Layer 2.
(a.) The relationships between six indicators and Para4 were generally negative, regardless of Para3, signifying that the performance worsened and the training time was decreased with an increase in Para4;
(b.) The performance using Para3 = 20 was nearly similar to that with Para3 = 30 and better than that with Para3 = 10, but the training time for the former was much shorter than that for the latter.
Therefore, we determined that the optimum parameters (indicated by the black arrow in Fig. 7) were Para3 = 20 and Para4 = 50.

The training Acc (top) and loss (bottom) change with iteration during the CNN training process

Comparison of the averaged classification performances using different kernel sizes and numbers of filters across ten folds. From left top to right top: Acc, Se, and Sp; from left bottom to right bottom: QI, AUC, and time

Comparison of the averaged classification performances using different max epochs and mini-batch sizes across ten-folds. From left top to right top: Acc, Se, and Sp; from left bottom to right bottom: QI, AUC, and time
| Layers | Type | Performance | |||||
|---|---|---|---|---|---|---|---|
| Acc (%) | Se (%) | Sp (%) | QI (%) | AUC (%) | Training Time (second) | ||
| 5 | I – C – P – F – O | 92.13 | 93.45 | 91.22 | 92.33 | 92.34 | 140.5 |
| 6 | I – C – P – C – F - O | 91.88 | 92.55 | 89.74 | 91.13 | 91.15 | 162.3 |
| 7 | I – C – P – C – P – F - O | 91.21 | 92.13 | 89.25 | 90.68 | 90.69 | 178.8 |
| 8 | I – C – P – C – P – F – F – O | 90.76 | 91.71 | 88.67 | 90.18 | 90.19 | 201.3 |
| 9 | I – C – P – C – P – C – F – F - O | 91.34 | 92.34 | 89.56 | 90.94 | 90.95 | 225.4 |
| 10 | I – C – P – C – P – C – P – F – F - O | 90.82 | 91.88 | 89.11 | 90.48 | 90.5 | 248.2 |
Experiment two: test of the CNN model

ROC curve of the proposed algorithm using different image resolutions and same optimization method
| Measurement | Acc (%) | Se (%) | Sp (%) | QI (%) | AUC (%) | Time (second) |
|---|---|---|---|---|---|---|
| Dataset | ||||||
| Set1 | 88.47 | 89.12 | 82.33 | 85.66 | 77.28 | 150 |
| Set2 | 94.22 | 96.92 | 86.11 | 91.36 | 92.03 | 317 |
| Set3 | 96.44 | 97.02 | 92.04 | 94.5 | 94.66 | 587 |
| Set4 | 98.34 | 98.22 | 94.87 | 96.53 | 97.82 | 1775 |
Discussion
In this work, a novel CAD system based on the CWT and 2D CNN was proposed to assist obstetricians in making objective decisions regarding fetal status. We experimentally obtained better performance by tuning several parameters of the CNN model. According to Figs. 6 and 7 and Table 4, the optimal parameters were clearly fixed after full consideration. Furthermore, Table 5 shows that the overall classification performance improves with higher image resolution using the same training settings, yet the greatest disadvantage is that longer training times are required.
(a.) Compared with [30], based on the same database (CTU-UHB) and image transformation method (CWT), our approach performs much better (Se = 98.22 and 63.45%, Sp = 94.87 and 65.88%), which further highlights the superiority of CNN over ML.
(b.) Compared with [20], although the test database is different, the 2D CNN model obtains higher accuracy than 1D CNN (Acc = 98.34% and 93.24).
(c.) Compared with [21], based on the same database and 2D CNN model, the CWT can better reflect the characteristic information of FHR signal than STFT according to the time-frequency image (Se = 98.22 and 56.15%, Sp = 94.87 and 96.51%, QI = 96.53 and 73.61%).
(d.) To the best of our knowledge, this CNN algorithm achieved better classification performance in predicting fetal state using FHR signals compared with other related works, as presented in Table 7.
In summary, the proposed system has several attractive advantages: (i.) feature extraction and selection techniques are not required; (ii.) the CWT is used to obtain 2D time-frequency images, which is believed to reflect the hidden characteristics of the FHR signals in both the time and frequency domains; (iii.) an 8-layer deep 2D CNN is implemented and its parameters are tuned to obtain better performance; and (iv.) this approach performs best among the state-of-the-art methods.
Nevertheless, the proposed system has some drawbacks: (i.) the training of the CNN model requires a very large amount of diverse data; and (ii.) the algorithm is computationally intensive in learning useful features from the input images.
In fact, if this classification algorithm can accurately discriminate between normal and pathological classes, then the long training time will be secondary in medical fields. Once the CAD system designed by the proposed algorithm is successfully trained, the system can immediately distinguish an unknown fetal state. Fortunately, some solutions are available to overcome the drawbacks of our proposed system. We can enrich the dataset using image transformation, such as rotation, cropping and whitening, etc. Then, training CNN models integrated with a graphics processing unit (GPU) will help significantly decrease training time and power consumption since one of the important properties of the CNN algorithm is its concurrency.
| Scheme | Performance (Validation) | ||||
|---|---|---|---|---|---|
| Acc (%) | Se (%) | Sp(%) | QI(%) | AUC(%) | |
| HHT | 79.5 | 79.71 | 79.29 | 79.52 | 79.63 |
| Gabor Transformation | 76.38 | 80.56 | 72.33 | 76.25 | 77.22 |
| STFT | 83.27 | 86.78 | 78.83 | 82.91 | 83.1 |
| CWT | 98.34 | 98.22 | 94.87 | 96.53 | 97.82 |
| Author | Database | Distribution (N/P) | Method | Performance(%) | ||
|---|---|---|---|---|---|---|
| Feature extraction | Feature selection | Classifier | ||||
| Krupa et al. 2011 [] [13] | Private | 30/60 | EMD | / | SVM | Acc:87 Se:95 Sp:70 |
| Spilka et al.2012 [] [12] | Private | 123/94 | 33 Set1, Set2, Set3 | PCA,IG | NB,SVM,DT | Se:73.4 Sp:76.3 Fm:71.5 |
| Czabanski et al. 2012 [] [14] | Private | 146/43 | 7 Set1 | / | WFS+ LS-SVM | Acc:92.0 QI:88.2 |
| Fanelli et al. 2013 [] [15] | Private | 61/61 | 2 Set3 | / | ST | AUC:75 |
| Xu et al. 2014 [] [40] | Private | 255/255 | 64 Set1, Set2, Set3 | GA | SVM | Se:83 Sp:66 AUC:74 |
| Dash et al. 2014 [] [41] | Private | 60/23 | 8 Set1 | / | GM,NB | Se: 61 Sp:82 |
| Spilka et al. 2014 [] [42] | CTU-UHB | 175/377 | 33 Set1,Set2, Set3 | / | LCA + RF | Se:72 Sp:78 |
| Doret et al. 2015 [] [11] | Private | 30/15 | 12 Set2, Set3 | / | ST | AUC:87 |
| Comert et al. 2016 [] [43] | CTU-UHB | 60/40 | 18 Set1, Set2 | / | ANN | Acc: 87.0 Se:88.7 Sp:85.1 |
| Stylios et al. 2016 [] [44] | CTU-UHB | 508/44 | 54 Set1, Set2, Set3 | AUC | LS-SVM | Se:68.5 Sp:77.7 |
| Comert et al. 2016 [] [16] | CTU-UHB | 272/280 | 11 Set2, Set3 | / | ANN | Acc: 92.40 Se:95.89 Sp:74.75 |
| Georgoulas et al. 2017 [] [45] | CTU-UHB | 508/44 | 33 Set1, Set2, Set3 | AUC | LS-SVM | Se:72.12 Sp:65.30 |
| Comert et al. 2018 [] [31] | CTU-UHB | 439/113 | IBTF | GA/ | LS-SVM | Se:63.45 Sp:65.88 |
| Li et al. 2018 [] [21] | Private | 3012/1461 | FHR + 1D CNN | Acc:93.24 | ||
| Comert et al. 2018 [] [22] | CTU-UHB | 508/44 | STFT+2D CNN | Se:56.15 Sp:96.51 QI:73.61 | ||
| Current work | CTU-UHB | 447/105 | CWT + 2D CNN | Acc:98.34Se:98.22Sp:94.87QI:96.53AUC:97.82 | ||
Conclusions
The accurate diagnosis of fetal acidemia caused by hypoxia can allow obstetricians to intervene in a timely manner and take appropriate action to prevent permanent damage to the fetus. In clinical practice, the FHR signal is a commonly used tool to monitor the fetal state during labor and delivery. However, a visual analysis of the FHR signal with the naked eye is a challenging task for obstetricians since this type of assessment is subjective and irreproducible. Visual interpretation easily leads to significant inter-observer and intra-observer variability. Therefore, implementing a CAD system in clinical settings will guarantee the rapid and accurate prediction of fetal distress more objectively.
In this study, our primary contribution is to propose a data-driven approach to automatically assess the fetal state using a deep CNN. After signal peprocessing, the input time-frequency images were obtained using the CWT with different types of mother wavelets and wavelet scales. After comprehensive experimentation focused on tuning the parameters and changing the image sizes, we achieved the best classification performance with the optimum configuration (8 layers, size of the convolution kernel = 5 × 5, number of filters = 15, maximum number of epochs = 20, size of the mini-batch = 50, and image resolution = 64 × 64 × 3), and the averaged Acc, Se, and Sp were 98.34, 98.22, and 94.87% across ten folds, respectively. To alleviate the influence of the class imbalance phenomenon, QI and AUC indicators were also applied to measure the overall performance with values of 96.53 and 97.82%, respectively. Since using features is susceptible to bias in extracting (selecting) the features and limits the ability of a classifier to fully learn from the data, the CNN-based framework obviated the requirement for feature engineering (i.e., feature extraction and selection). Overall, the results proved the effectiveness of our proposed CAD system, which can be introduced into clinical practice and assist obstetricians in making accurate medical decisions objectively.
The results are promising and provide the baseline for future research involving strategies without feature extraction and selection and entirely relying on the neural network model for fetal state assessment. GPUs will be integrated into the workstation to reduce the complexity and speed up the training process in terms of computation. In addition, we will combine FHR signal with other biomedical signals (e.g., UC) to improve the accuracy for providing more reliable decision tool. To make the system more explainable for the obstetricians and pregnant women is also a huge challenge.