[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHGf_=r4sqQKELPh48z=KPyuyAM3uz5Az9RpssUwnK4QRoamHQ@mail.gmail.com>
Date: Sat, 25 May 2013 21:11:21 -0400
From: KOSAKI Motohiro <kosaki.motohiro@...il.com>
To: Christoph Lameter <cl@...ux.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Al Viro <viro@...iv.linux.org.uk>,
Vince Weaver <vincent.weaver@...ne.edu>,
LKML <linux-kernel@...r.kernel.org>,
Paul Mackerras <paulus@...ba.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
trinity@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Roland Dreier <roland@...nel.org>, infinipath@...gic.com,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
linux-rdma@...r.kernel.org, Or Gerlitz <or.gerlitz@...il.com>
Subject: Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
On Fri, May 24, 2013 at 11:40 AM, Christoph Lameter <cl@...ux.com> wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
>
>> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
>> broke RLIMIT_MEMLOCK.
>
> Nope the patch fixed a problem with double accounting.
>
> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
>
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).
>
> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.
>
> Pages can be both pinned and mlocked. Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
>
pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
>
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.
>
> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
>
> You want the counter to mean something different it seems. What is it?
>
> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.
Hm.
If pinned and mlocked are totally difference intentionally, why IB uses
RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
number of pinned pages and other gup users don't.
I can't guess IB folk's intent.
And now ever IB code has duplicated RLIMIT_MEMLOCK
check and at least __ipath_get_user_pages() forget to check
capable(CAP_IPC_LOCK).
That's bad.
> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
>
>> This patch proposes to properly fix the problem by introducing
>> VM_PINNED. This also provides the groundwork for a possible mpin()
>> syscall or MADV_PIN -- although these are not included.
>
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.
Generically, you are right. But if VM_PINNED is really only for IB,
this is acceptable
limitation. They can split vma for their own purpose.
Anyway, I agree we should clearly understand the semantics of IB pinning and
the userland usage and assumption.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists