[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <0FC3F99A-9F77-484A-899B-EDCBEFBFAC5D@gmail.com>
Date: Mon, 27 Sep 2021 12:12:46 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Michal Hocko <mhocko@...e.com>
Cc: David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Peter Xu <peterx@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Minchan Kim <minchan@...nel.org>,
Colin Cross <ccross@...gle.com>,
Suren Baghdasarya <surenb@...gle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 0/8] mm/madvise: support
process_madvise(MADV_DONTNEED)
> On Sep 27, 2021, at 5:16 AM, Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 27-09-21 05:00:11, Nadav Amit wrote:
> [...]
>> The manager is notified on memory regions that it should monitor
>> (through PTRACE/LD_PRELOAD/explicit-API). It then monitors these regions
>> using the remote-userfaultfd that you saw on the second thread. When it wants
>> to reclaim (anonymous) memory, it:
>>
>> 1. Uses UFFD-WP to protect that memory (and for this matter I got a vectored
>> UFFD-WP to do so efficiently, a patch which I did not send yet).
>> 2. Calls process_vm_readv() to read that memory of that process.
>> 3. Write it back to “swap”.
>> 4. Calls process_madvise(MADV_DONTNEED) to zap it.
>
> Why cannot you use MADV_PAGEOUT/MADV_COLD for this usecase?
Providing hints to the kernel takes you so far to a certain extent.
The kernel does not want to (for a good reason) to be completely
configurable when it comes to reclaim and prefetch policies. Doing
so from userspace allows you to be fully configurable.
> MADV_DONTNEED on a remote process has been proposed in the past several
> times and it has always been rejected because it is a free ticket to all
> sorts of hard to debug problems as it is just a free ticket for a remote
> memory corruption. An additional capability requirement might reduce the
> risk to some degree but I still do not think this is a good idea.
I would argue that there is nothing bad that remote MADV_DONTNEED can do
that process_vm_writev() cannot do as well (putting aside ptrace).
process_vm_writev() is checking:
mm = mm_access(task, PTRACE_MODE_ATTACH_REALCREDS)
Wouldn't adding such a condition suffice?
Powered by blists - more mailing lists