MultiQueue-Based FPGA Routing: Relaxed A* Priority Ordering for Improved Parallelism

Alexandre Singer, Hang Yan, Guozheng Zhang, Mark C. Jeffrey, Mirjana Stojilović and Vaughn Betz.
In Proceedings of the 23rd IEEE International Conference on Field-Programmable Technology (FPT)
December 2024

Summary

MultiQueue-Based FPGA Routing: Relaxed A* Priority Ordering for Improved Parallelism. Alexandre Singer, Hang Yan, Guozheng Zhang, Mark C. Jeffrey, Mirjana Stojilović and Vaughn Betz. In Proceedings of the 23rd IEEE International Conference on Field-Programmable Technology (FPT). December 2024. FPT 2024 (Acceptance rate: 27.5%). (Best Paper Nominee)
[text][talk][slides][code]

Talk

Abstract

Routing is a critical part of the FPGA CAD flow. Its solution quality greatly impacts design speed and power, and its run time significantly contributes to overall compile time. As FPGA design size grows but per-core CPU performance stagnates, parallel FPGA routing algorithms that can leverage multiple compute cores become increasingly valuable. Most prior work in parallel FPGA routing uses coarse-grained approaches that route nonoverlapping nets in parallel; this work targets a complementary fine-grained form of parallelism in which the shortest-path algorithms that complete a single connection are multi-threaded. We speed up several related shortest path algorithms (Dijkstra’s, A*, and directed) by utilizing a concurrencyfriendly but weakly ordered data structure, the MultiQueue, and enhance the algorithms to compensate for its imperfect ordering of partial routings. Compared to the VTR 8+ router, these techniques achieve routing time reductions of 18.7× and 13.2× on average over the Titan benchmark suite when using Dijkstra’s and A* path search, respectively, on a 12-core CPU. These parallel algorithms achieve wirelength and critical path delay quality comparable to the serial router; they are also deterministic and serially equivalent. When applied to directed search routing, the parallel path search achieves a speed-up of 1.98× and a slightly higher quality than the serial VTR router, but is nondeterministic. Thanks to queue improvements, our router at 1-thread is 1.7× faster than VTR’s for Dijkstra’s and A* search, but comparable in run time for directed search.

More Info

I had the privilege of presenting this research paper at the International Conference on Field Programmable Technology (FPT) 2024 in the stunning city of Sydney, Australia. FPT 2024 received 69 paper submissions, and I’m honored that this work was one of the 19 papers accepted (27.5% acceptance rate)—where this paper was one of 3 papers nominated for the FPT 2024 Best Paper Award!

View All Publications