lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zfpohg3EGxxOEcWg@memverge.com>
Date: Wed, 20 Mar 2024 00:39:34 -0400
From: Gregory Price <gregory.price@...verge.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Gregory Price <gourry.memverge@...il.com>, linux-mm@...ck.org,
	linux-api@...r.kernel.org, linux-arch@...r.kernel.org,
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
	dan.j.williams@...el.com, honggyu.kim@...com, corbet@....net,
	arnd@...db.de, luto@...nel.org, akpm@...ux-foundation.org,
	shuah@...nel.org
Subject: Re: [RFC v3 0/3] move_phys_pages syscall - migrate page contents
 given

On Wed, Mar 20, 2024 at 10:48:44AM +0800, Huang, Ying wrote:
> Gregory Price <gourry.memverge@...il.com> writes:
> 
> > Doing this reverse-translation outside of the kernel requires considerable
> > space and compute, and it will have to be performed again by the existing
> > system calls.  Much of this work can be avoided if the pages can be
> > migrated directly with physical memory addressing.
> 
> One difficulty of the idea of the physical address is that we lacks some
> user space specified policy information to make decision.  For example,
> users may want to pin some pages in DRAM to improve latency, or pin some
> pages in CXL memory to do some best effort work.  To make the correct
> decision, we need PID and virtual address.
> 

I think of this as a second or third order problem.  The core problem
right now isn't the practicality of how userland would actually use this
interface - the core problem is whether the data generated by offloaded
monitoring is even worth collecting and operating on in the first place.  

So this is a quick hack to do some research about whether it's even
worth developing the whole abstraction described by Willy.

This is why it's labeled RFC.  I upped a v3 because I know of two groups
actively looking at using it for research, and because the folio updates
broke the old version.  It's also easier for me to engage through the
list than via private channels for this particular work.


Do I suggest we merge this interface as-is? No, too many concerns about
side channels.  However, it's a clean reuse of move_pages code to
bootstrap the investigation, and it at least gets the gears turning.

Example notes from a sidebar earlier today:

* An interesting proposal from Dan Williams would be to provide some
  sort of `/sys/.../memory_tiering/tierN/promote_hot` interface, with
  a callback mechanism into the relevant hardware drivers that allows
  for this to be abstracted.  This could be done on some interval and
  some threshhold (# pages, hotness threshhold, etc).


The code to execute promotions ends up looking like what I have now

1) Validate the page is elgibile to be promoted by walking the vmas
2) invoking the existing move_pages code

The above idea can be implemented trivially in userland without
having to plumb through a whole brand new callback system.


Sometimes you have to post stupid ideas to get to the good ones :]

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ