linux-kernel - Re: [RFC PATCH 0/5] Enhancements to Page Migration with Batch Offloading via DMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c024d035-dc94-4e89-a935-795ab2ce24e7@amd.com>
Date: Mon, 17 Jun 2024 17:10:18 +0530
From: "Garg, Shivank" <shivankg@....com>
To: Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, bharata@....com, raghavendra.kodsarathimmappa@....com,
 Michael.Day@....com, dmaengine@...r.kernel.org, vkoul@...nel.org
Subject: Re: [RFC PATCH 0/5] Enhancements to Page Migration with Batch
 Offloading via DMA

Hi Matthew,

On 6/15/2024 9:32 AM, Matthew Wilcox wrote:
> On Sat, Jun 15, 2024 at 03:45:20AM +0530, Shivank Garg wrote:

> 
> You haven't measured the important thing though -- what's the cost
> _to userspace_?  When the CPU does the copy, the data is now
> cache-hot in that CPU's cache.  When the DMA engine does the copy,
> it's not cache-hot in any CPU.
> 
> Now, this may not be a big problem.  I don't think we do anything to 
> ensure that the CPU that is going to access the folio in userspace
> is the one which does the copy.
> 
> But your methodology is wrong.

You're right about importance of measuring the cost to userspace.
I initially focused on analyzing the folio_copy overheads within migrate_pages to identify potential optimizations opportunities using DMA hardware accelerators.

To address this, I'm planning extend my experiments to measure the cost to userspace specifically related to cache-hotness. This will involve the accessing the migrated pages after the migration process is complete, and measuring the resulting latency to read/write.

This approach of DMA-offloading could possibly help in scenarios involving bulk data copying with workload size >> cache capacity or incurs a large shootdown overhead.

The userspace cost analysis will provide a more comprehensive picture of page-migration using CPU v/s DMA-offloading.

I appreciate your feedback.

Shivank