Preprint Article Version 1 This version is not peer-reviewed

Parallel PSO for the Efficient Training of Neural Networks Using the GPGPU and Apache Spark in an Edge Computing Environment

Version 1 : Received: 15 July 2024 / Approved: 16 July 2024 / Online: 16 July 2024 (08:29:08 CEST)

How to cite: Capel, M. I.; Salguero-Hidalgo, A.; Holgado-Terriza, J. A. Parallel PSO for the Efficient Training of Neural Networks Using the GPGPU and Apache Spark in an Edge Computing Environment. Preprints 2024, 2024071300. https://doi.org/10.20944/preprints202407.1300.v1 Capel, M. I.; Salguero-Hidalgo, A.; Holgado-Terriza, J. A. Parallel PSO for the Efficient Training of Neural Networks Using the GPGPU and Apache Spark in an Edge Computing Environment. Preprints 2024, 2024071300. https://doi.org/10.20944/preprints202407.1300.v1

Abstract

Deep learning neural networks require an immense amount of computation, especially in the training phase of the network when networks with multiple layers of intermediate neurons need to be built. In this paper, we will focus on the PSO algorithm with the aim of significantly accelerating the DLNN training phase by taking advantage of the GPGPU architecture and the Apache Spark analytics engine for large-scale data processing tasks. PSO is a bio-inspired stochastic optimization method whose goal is to iteratively improve the solution to a (usually complex) problem by attempting to approximate a given objective. However, parallelizing an efficient PSO is not a straightforward process due to the complexity of the computations performed on the swarm of particles and the iterative execution of the algorithm until a solution close to the objective with minimal error is achieved. In the present work, two parallelizations of the PSO algorithm have been implemented , both designed for a distributed execution environment. The synchronous parallel PSO implementation ensures consistency at the cost of potential idle time due to global synchronization, while the asynchronous parallel PSO approach improves execution time by reducing the need for global synchronization, making it more suitable for large datasets and distributed environments such as Apache Spark. Both variants of PSO have been implemented to distribute the computational load supported by this algorithm –due to the costly fitness evaluation and updating of particle positions– across the different Spark cluster executor nodes to effectively achieve coarse-grained parallelism, resulting in a significant performance increase over current sequential variants of PSO.

Keywords

Apache Spark; Classification recall; Deep Neural Networks; GPU Parallelism; Optimization research; Particle Swarm Optimization (PSO); Predictive Accuracy.

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.