[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQnmVI0Q/Al5UKgQ@memverge.com>
Date: Tue, 19 Sep 2023 14:20:04 -0400
From: Gregory Price <gregory.price@...verge.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Jonathan Corbet <corbet@....net>,
Gregory Price <gourry.memverge@...il.com>,
linux-mm@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-arch@...r.kernel.org, Linux API <linux-api@...r.kernel.org>,
linux-cxl@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Arnd Bergmann <arnd@...db.de>,
Andrew Morton <akpm@...ux-foundation.org>,
the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall
On Tue, Sep 19, 2023 at 10:59:33AM -0700, Andy Lutomirski wrote:
>
> I'm not complaining about the name. I'm objecting about the semantics.
>
> Apparently you have a system to collect usage statistics of physical addresses, but you have no idea what those pages map do (without crawling /proc or /sys, anyway). But that means you have no idea when the logical contents of those pages *changes*. So you fundamentally have a nasty race: anything else that swaps or migrates those pages will mess up your statistics, and you'll start trying to migrate the wrong thing.
How does this change if I use virtual address based migration?
I could do sampling based on virtual address (page faults, IBS/PEBs,
whatever), and by the time I make a decision, the kernel could have
migrated the data or even my task from Node A to Node B. The sample I
took is now stale, and I could make a poor migration decision.
If I do move_pages(pid, some_virt_addr, some_node) and it migrates the
page from NodeA to NodeB, then the device-side collection is likewise
no longer valid. This problem doesn't change because I used virtual
address compared to physical address.
But if i have a 512GB memory device, and i can see a wide swath of that
512GB is hot, while a good chunk of my local DRAM is not - then I
probably don't care *what* gets migrated up to DRAM, i just care that a
vast majority of that hot data does.
The goal here isn't 100% precision, you will never get there. The goal
here is broad-scope performance enhancements of the overall system
while minimizing the cost to compute the migration actions to be taken.
I don't think the contents of the page are always relevant. The entire
concept here is to enable migration without caring about what programs
are using the memory for - just so long as the memcg's and zoning is
respected.
~Gregory
Powered by blists - more mailing lists