Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Beyond Network Switching: FPGA-based Switch Architecture for Fast and Accurate Ensemble Learning

Version 1 : Received: 4 September 2024 / Approved: 5 September 2024 / Online: 5 September 2024 (12:11:34 CEST)

How to cite: Meng, J.; Que, Z.; Guo, C.; Luk, W. Beyond Network Switching: FPGA-based Switch Architecture for Fast and Accurate Ensemble Learning. Preprints 2024, 2024090457. https://doi.org/10.20944/preprints202409.0457.v1 Meng, J.; Que, Z.; Guo, C.; Luk, W. Beyond Network Switching: FPGA-based Switch Architecture for Fast and Accurate Ensemble Learning. Preprints 2024, 2024090457. https://doi.org/10.20944/preprints202409.0457.v1

Abstract

Recent advancements in Artificial Intelligence (AI) and generative models have significantly accelerated the evolution of internet services. In data centers, where data is often distributed across multiple machines, leveraging these machines to expedite the training of machine learning applications is common. However, this approach incurs substantial communication overhead. To mitigate this, data compression techniques such as quantization, sparsification, and dimension reduction are gaining attention, especially with iterative training algorithms that frequently exchange gradients. With the advancement of FPGA technology, FPGA-based bump-in-the-wire network interface controllers (smartNICs) have demonstrated notable benefits for data centers, matching ASIC switches in port-to-port latency and capacity. However, FPGA-based switches often have spare hardware resources beyond their switching functions, which are typically wasted. we propose a novel architecture that integrates FPGA-based switches with ensemble learning methods, utilizing these spare resources on FPGA-based network switches. This architecture employs random projection as a compression method to accelerate machine learning tasks while maintaining high accuracy. Our system accelerates the training process for Multilayer Perceptron (MLP) models with a speedup of 2.1-6.7 times across four high-dimensional datasets. In addition, it leverages the Hamming distance metric for k-Nearest Neighbors (kNN) classification, combining random projection with Hamming encoding, to offload large-scale classification tasks from downstream servers. This achieves a worst-case speedup of 2.3-8.7 times and a memory reduction of 3.9-27.7 times compared to CPU-based kNN classifiers using cosine distance. Our architecture addresses the challenges of high-dimensional data processing and exemplifies the potential of AI-driven solutions in accelerating and optimizing internet services. Moreover, the programmability of FPGAs allows the system to adopt various compression methods, extending its applicability. Offloading tasks to FPGAs can significantly reduce query response times, showcasing a crucial step towards achieving greater levels of autonomic computing and the rapid evolution of internet systems.

Keywords

ensemble learning; FPGA-based switch; random projection; k-nearest neighbour; cloud computing

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.