linux-kernel - Re: Move to unshared VMAs in NOMMU mode?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1bqinsvy3.fsf@ebiederm.dsl.xmission.com>
Date:	Tue, 20 Mar 2007 10:48:04 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	David Howells <dhowells@...hat.com>
Cc:	Hugh Dickins <hugh@...itas.com>, bryan.wu@...log.com,
	Robin Holt <holt@....com>,
	"Kawai, Hidehiro" <hidehiro.kawai.ez@...achi.com>,
	Andrew Morton <akpm@...l.org>,
	kernel list <linux-kernel@...r.kernel.org>,
	Pavel Machek <pavel@....cz>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	sugita <yumiko.sugita.yf@...achi.com>,
	Satoshi OSHIMA <soshima@...hat.com>, haoki@...hat.com,
	Robin Getz <rgetz@...ckfin.uclinux.org>
Subject: Re: Move to unshared VMAs in NOMMU mode?

David Howells <dhowells@...hat.com> writes:

> Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
>> As I understand your description for non-shared mappings the VMAs are
>> per process.
>
> Are you talking about the current state of play?  If so, not precisely.  In
> the current scheme of things, *all* VMAs are kept in a global tree and are
> globally available; it's just that any VMA that's not shareable will not be
> shared, and so, in effect, is per-process.
>
> In my suggested revamp, VMAs revert to being per-process objects only, and
> sharing is effected by indirection.

Ok, and I do think per process VMAs are the way to go if you are
going to stay anywhere close to the rest of linux.

>> For shared mappings you share in some sense the page cache.
>
> Currently, no - not unless the driver does something clever as ramfs does.
> Sharing through the page cache is a nice idea, but it has some limitations,
> mainly that non-sharing then operates differently.

I don't quite what your limitations are but it is a fundamental
assumption that the sharing happen in the page cache.  I.e. The pages
that are in the page cache are the only ones that can be effectively
shared.

As a corollary if a page is not in the page cache it should not be
shared.  I guess this limits sharing to just those things in ramfs?

You obviously don't have the hardware to enforce this but at least
you can define the sharing of anything else as undefined.

Now I don't know what it takes from your data structures to achieve
sharing of page cache pages.  Or what your underlying complications
are.  Not having the page cache as the fundamental underlying sharing
pool is strongly non-intuitive from the perspective of the rest
of linux.

>> My gut feel says just keep a vma per process of the regions the
>> process has and do the appropriate book keeping and all will be fine.
>
> I'm sure it will be, but at the cost of consuming extra memory.  I'm not sure
> that the amount of extra memory is, however, all that significant.  Now that I
> think about it, I don't imagine that a lot of processes are going to be
> running at once on a NOMMU system, and so the scope for sharing isn't all that
> wide.

Sure.  But this is sharing with the kernel as well.  And you always
have at least one kernel and one application.  So if they can share
the file buffers that is a win.  And what mmap of any file backed
pages is expected to provide.

>> For shm_nattach it looks like you simply are not calling the
>> open/close methods on fork (because you have a shared pool of vmas).
>
> There is no fork.

Scratch the fork part.  You still aren't calling the open/close
methods when they would normally be called, and that is where your
problem lies.

> No, the problem is that sys_shmat() relies on do_mmap_pgoff() to call the VMA
> open() method.  However, this assumes that a new VMA will be made
> per process.

Exactly.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/