[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190521103223.GD219653@google.com>
Date: Tue, 21 May 2019 19:32:23 +0900
From: Minchan Kim <minchan@...nel.org>
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Johannes Weiner <hannes@...xchg.org>,
Tim Murray <timmurray@...gle.com>,
Joel Fernandes <joel@...lfernandes.org>,
Suren Baghdasaryan <surenb@...gle.com>,
Daniel Colascione <dancol@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Sonny Rao <sonnyrao@...gle.com>,
Brian Geffon <bgeffon@...gle.com>, linux-api@...r.kernel.org
Subject: Re: [RFC 5/7] mm: introduce external memory hinting API
On Tue, May 21, 2019 at 08:17:43AM +0200, Michal Hocko wrote:
> On Tue 21-05-19 11:41:07, Minchan Kim wrote:
> > On Mon, May 20, 2019 at 11:18:29AM +0200, Michal Hocko wrote:
> > > [Cc linux-api]
> > >
> > > On Mon 20-05-19 12:52:52, Minchan Kim wrote:
> > > > There is some usecase that centralized userspace daemon want to give
> > > > a memory hint like MADV_[COOL|COLD] to other process. Android's
> > > > ActivityManagerService is one of them.
> > > >
> > > > It's similar in spirit to madvise(MADV_WONTNEED), but the information
> > > > required to make the reclaim decision is not known to the app. Instead,
> > > > it is known to the centralized userspace daemon(ActivityManagerService),
> > > > and that daemon must be able to initiate reclaim on its own without
> > > > any app involvement.
> > >
> > > Could you expand some more about how this all works? How does the
> > > centralized daemon track respective ranges? How does it synchronize
> > > against parallel modification of the address space etc.
> >
> > Currently, we don't track each address ranges because we have two
> > policies at this moment:
> >
> > deactive file pages and reclaim anonymous pages of the app.
> >
> > Since the daemon has a ability to let background apps resume(IOW, process
> > will be run by the daemon) and both hints are non-disruptive stabilty point
> > of view, we are okay for the race.
>
> Fair enough but the API should consider future usecases where this might
> be a problem. So we should really think about those potential scenarios
> now. If we are ok with that, fine, but then we should be explicit and
> document it that way. Essentially say that any sort of synchronization
> is supposed to be done by monitor. This will make the API less usable
> but maybe that is enough.
Okay, I will add more about that in the description.
>
> > > > To solve the issue, this patch introduces new syscall process_madvise(2)
> > > > which works based on pidfd so it could give a hint to the exeternal
> > > > process.
> > > >
> > > > int process_madvise(int pidfd, void *addr, size_t length, int advise);
> > >
> > > OK, this makes some sense from the API point of view. When we have
> > > discussed that at LSFMM I was contemplating about something like that
> > > except the fd would be a VMA fd rather than the process. We could extend
> > > and reuse /proc/<pid>/map_files interface which doesn't support the
> > > anonymous memory right now.
> > >
> > > I am not saying this would be a better interface but I wanted to mention
> > > it here for a further discussion. One slight advantage would be that
> > > you know the exact object that you are operating on because you have a
> > > fd for the VMA and we would have a more straightforward way to reject
> > > operation if the underlying object has changed (e.g. unmapped and reused
> > > for a different mapping).
> >
> > I agree your point. If I didn't miss something, such kinds of vma level
> > modify notification doesn't work even file mapped vma at this moment.
> > For anonymous vma, I think we could use userfaultfd, pontentially.
> > It would be great if someone want to do with disruptive hints like
> > MADV_DONTNEED.
> >
> > I'd like to see it further enhancement after landing address range based
> > operation via limiting hints process_madvise supports to non-disruptive
> > only(e.g., MADV_[COOL|COLD]) so that we could catch up the usercase/workload
> > when someone want to extend the API.
>
> So do you think we want both interfaces (process_madvise and madvisefd)?
What I have in mind is to extend process_madvise later like this
struct pr_madvise_param {
int size; /* the size of this structure */
union {
const struct iovec __user *vec; /* address range array */
int fd; /* supported from 6.0 */
}
}
with introducing new hint Or-able PR_MADV_RANGE_FD, so that process_madvise
can go with fd instead of address range.
Powered by blists - more mailing lists