linux-kernel - Re: [LSF/MM/BPF TOPIC] Enhancements to Page Migration with Multi-threading and Batch Offloading to DMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <02548b27-2442-4172-8f4f-a6fb7588d705@amd.com>
Date: Tue, 25 Mar 2025 10:50:39 +0530
From: Shivank Garg <shivankg@....com>
To: akpm@...ux-foundation.org, lsf-pc@...ts.linux-foundation.org,
 linux-mm@...ck.org, ziy@...dia.com
Cc: AneeshKumar.KizhakeVeetil@....com, baolin.wang@...ux.alibaba.com,
 bharata@....com, david@...hat.com, gregory.price@...verge.com,
 honggyu.kim@...com, jane.chu@...cle.com, jhubbard@...dia.com,
 jon.grimm@....com, k.shutemov@...il.com, leesuyeon0506@...il.com,
 leillc@...gle.com, liam.howlett@...cle.com, linux-kernel@...r.kernel.org,
 mel.gorman@...il.com, Michael.Day@....com,
 Raghavendra.KodsaraThimmappa@....com, riel@...riel.com, rientjes@...gle.com,
 santosh.shukla@....com, shy828301@...il.com, sj@...nel.org,
 wangkefeng.wang@...wei.com, weixugc@...gle.com, willy@...radead.org,
 ying.huang@...ux.alibaba.com, wei.huang2@....com,
 Jonathan.Cameron@...wei.com, byungchul@...com
Subject: Re: [LSF/MM/BPF TOPIC] Enhancements to Page Migration with
 Multi-threading and Batch Offloading to DMA



On 3/24/2025 11:31 AM, Shivank Garg wrote:
> 
> 
> On 1/23/2025 11:25 AM, Shivank Garg wrote:
>> Hi all,
>>
>> Zi Yan and I would like to propose the topic: Enhancements to Page
>> Migration with Multi-threading and Batch Offloading to DMA.
>>
>> Page migration is a critical operation in NUMA systems that can incur
>> significant overheads, affecting memory management performance across
>> various workloads. For example, copying folios between DRAM NUMA nodes
>> can take ~25% of the total migration cost for migrating 256MB of data.
>>
>> Modern systems are equipped with powerful DMA engines for bulk data
>> copying, GPUs, and high CPU core counts. Leveraging these hardware
>> capabilities becomes essential for systems where frequent page promotion
>> and demotion occur - from large-scale tiered-memory systems with CXL nodes
>> to CPU-GPU coherent system with GPU memory exposed as NUMA nodes.
>>
>> Existing page migration performs sequential page copying, underutilizing
>> modern CPU architectures and high-bandwidth memory subsystems.
>>
>> We have proposed and posted RFCs to enhance page migration through three
>> key techniques:
>> 1. Batching migration operations for bulk copying data [1]
>> 2. Multi-threaded folio copying [2]
>> 3. DMA offloading to hardware accelerators [1]
>>
>> By employing batching and multi-threaded folio copying, we are able to
>> achieve significant improvements in page migration throughput for large
>> pages.
>>
>> Discussion points:
>> 1. Performance:
>>    a. Policy decision for DMA and CPU selection
>>    b. Platform-specific scheduling of folio-copy worker threads for better
>>       bandwidth utilization
>>    c. Using Non-temporal instructions for CPU-based memcpy
>>    d. Upscaling/downscaling worker threads based on migration size, CPU
>>       availability (system load), bandwidth saturation, etc.
>> 2. Interface requirements with DMA hardware:
>>    a. Standardizing APIs for DMA drivers and support for different DMA
>>       drivers
>>    b. Enhancing DMA drivers for bulk copying (e.g., SDXi Engine)
>> 3. Resources Accounting:
>>    a. CPU cgroups accounting and fairness [3]
>>    b. Who bears migration cost? - (Migration cost attribution)
>>
> 
> Hi all,
> 
> For reference, here is the link to the latest RFC v2:
> 
> https://lore.kernel.org/linux-mm/20250319192211.10092-1-shivankg@amd.com
> 
> This version combines the ideas discussed in [1] and [2] and includes details
> on performance improvements and experimental findings to provide more context
> for discussion.

Sharing the slides from today’s presentation:

Main Slide Deck: https://docs.google.com/presentation/d/1mjl5-jiz-TMVRK9bQcQ_IsSXrIP82CqWS8Q6em3mJi0/edit?usp=sharing
Multi-threading Slide Deck: https://docs.google.com/presentation/d/10czypcUbRMOUn6knp340Cwv4bf83Ha2gUX8TwNXUwCs/edit#slide=id.p6

Thanks,
Shivank

> 
>> References:
>> [1] https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com
>> [2] https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com
>> [3] https://lore.kernel.org/all/CAHbLzkpoKP0fVZP5b10wdzAMDLWysDy7oH0qaUssiUXj80R6bw@mail.gmail.com
> 
> Looking forward to your feedback!
> 
> Thanks,
> Shivank
>