lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 25 May 2013 21:11:21 -0400
From:	KOSAKI Motohiro <kosaki.motohiro@...il.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Vince Weaver <vincent.weaver@...ne.edu>,
	LKML <linux-kernel@...r.kernel.org>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	trinity@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Roland Dreier <roland@...nel.org>, infinipath@...gic.com,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	linux-rdma@...r.kernel.org, Or Gerlitz <or.gerlitz@...il.com>
Subject: Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

On Fri, May 24, 2013 at 11:40 AM, Christoph Lameter <cl@...ux.com> wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
>
>> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
>> broke RLIMIT_MEMLOCK.
>
> Nope the patch fixed a problem with double accounting.
>
> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
>
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).
>
> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.
>
> Pages can be both pinned and mlocked. Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
>
pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
>
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.
>
> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
>
> You want the counter to mean something different it seems. What is it?
>
> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.

Hm.
If pinned and mlocked are totally difference intentionally, why IB uses
RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
number of pinned pages and other gup users don't.
I can't guess IB folk's intent.

And now ever IB code has duplicated RLIMIT_MEMLOCK
check and at least __ipath_get_user_pages() forget to check
capable(CAP_IPC_LOCK).
That's bad.


> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
>
>> This patch proposes to properly fix the problem by introducing
>> VM_PINNED. This also provides the groundwork for a possible mpin()
>> syscall or MADV_PIN -- although these are not included.
>
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.

Generically, you are right. But if VM_PINNED is really only for IB,
this is acceptable
limitation. They can split vma for their own purpose.

Anyway, I agree we should clearly understand the semantics of IB pinning and
the userland usage and assumption.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ