A new method for hydraulic pump fault diagnosis improves model accuracy under imbalanced conditions by generating synthetic fault data.
In the world of rotating machinery, where hydraulic pumps often operate at high speeds and pressures, identifying faults before failure is critical. But fault data is hard to come by. Real-world datasets are heavily skewed toward normal operating conditions, leaving artificial intelligence models blind to the early signs of trouble. New research from Yanshan University offers a fresh approach to hydraulic pump fault diagnosis: one that fills in the blanks by generating realistic synthetic data using a deep learning model called DA-DCGAN.
The dual-attention deep convolutional generative adversarial network (DA-DCGAN) enhances fault detection by improving how minority fault samples are represented. The system uses continuous wavelet transform (CWT) to convert vibration data into detailed time-frequency images. These images serve as training input for the generative model, which can then simulate faults across a range of failure types and severities. This expands the dataset, helping classifiers detect real faults more accurately, especially in the early stages.
Solving the imbalance problem in pump diagnostics
The real challenge in hydraulic pump fault diagnosis lies in data imbalance. As pumps spend most of their service life in a healthy state, engineers end up with thousands of normal samples and only a handful showing actual faults. Standard classifiers tend to overfit to the dominant class, missing rare but critical issues like plunger wear or loose shoes.
DA-DCGAN tackles this by generating new fault samples that mimic real ones in both structure and complexity. Unlike earlier GAN models, this system integrates two types of attention mechanisms, channel and spatial attention, to refine the quality of generated images. These mechanisms allow the model to prioritise key features in the time-frequency domain, capturing subtle differences between healthy and faulty states.
To ensure realism, the generated samples were benchmarked against actual data using three key metrics: peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and Pearson correlation coefficient (PCC). The results showed strong alignment between synthetic and real samples, validating the method’s effectiveness.
“This technique lets us overcome one of the biggest hurdles in fault diagnosis,” Yang Zhao said. “We can now train models on synthetic fault data that reflects the real operating conditions of hydraulic systems.”
Real-world test: axial piston pump performance
To prove the concept, the researchers tested DA-DCGAN on a swash plate axial piston pump, simulating three common fault types: slipper wear, loose shoe, and plunger wear. These faults were introduced manually through precision grinding or component assembly, and then monitored under controlled conditions. Sensor arrays captured vibration data across multiple axes and temperatures, producing a high-resolution dataset for analysis.
The expanded datasets were then used to train two types of classifiers: a traditional convolutional neural network (CNN) and a modified dual-attention CNN (DA-CNN). When trained on imbalanced data, the DA-CNN achieved just under 86 per cent accuracy. But when the same classifier was trained on the expanded dataset generated by DA-DCGAN, accuracy jumped to nearly 99 per cent. That performance nearly matched the benchmark of a fully balanced real dataset, confirming the synthetic data’s value.
“Most failure datasets in industry are small and unbalanced,” Jiang said. “This approach brings them to life, making it possible to train high-performing models without waiting for failures to occur.”
From the lab to the field: future directions
While the results are promising, the method is not without challenges. Training deep generative models requires significant computing power, and GAN stability remains an ongoing research area. However, the paper proposes a useful fix: the two-timescale update rule (TTUR), which balances learning rates between the generator and discriminator. This significantly improved training convergence and output quality.
Looking ahead, the team sees opportunities to combine DA-DCGAN with lightweight models or edge deployment strategies, bringing synthetic learning into real-time diagnostics. With enough refinement, such techniques could power intelligent fault detection systems in the field, where early diagnosis of pump failures can reduce downtime, maintenance costs, and safety risks.
If smarter diagnostics begin with smarter data, the next frontier may not be in the sensor: it’s in the simulation.



