Presented at the Languages, Compilers, Tools and Theory of Embedded Systems (LCTES) 2023 conference. LCTES-2023
Direct Memory Access (DMA) is often used within hardware accelerators to transfer course-grain data to and from the host. The use of DMA is often significantly more performant than scalar loads and stores. This work implements an addition to the Affine Loop Tiling pass in MLIR that uses polyhedral analysis to determine a set of tile sizes that ensures that all data copies can be implemented as DMA operations.
This work is currently in the review process to be implemented as a patch to the upstream LLVM project. The merge request can be found here.