lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzwRoM4w8mGqSeeVuDGhQgnnomu=vxoWC6dbHD9w-9A+Q@mail.gmail.com>
Date:	Tue, 22 Oct 2013 17:20:40 +0100
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Michel Lespinasse <walken@...gle.com>
Cc:	Davidlohr Bueso <davidlohr@...com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Rik van Riel <riel@...hat.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/3] mm,vdso: preallocate new vmas

On Tue, Oct 22, 2013 at 4:48 PM,  <walken@...gle.com> wrote:

>
> Generally the problems I see with mmap_sem are related to long latency
> operations. Specifically, the mmap_sem write side is currently held
> during the entire munmap operation, which iterates over user pages to
> free them, and can take hundreds of milliseconds for large VMAs.

So this would be the *perfect* place to just downgrade the semaphore
from a write to a read.

Do the vma ops under the write semaphore, then downgrade it to a
read-sem, and do the page teardown with just mmap_sem held for
reading..

Comments? Anybody want to try that? It should be fairly
straightforward, and we had a somewhat similar issue when it came to
mmap() having to populate the mapping for mlock. For that case, it was
sufficient to just move the "populate" phase outside the lock entirely
(for that case, we actually drop the write lock and then take the
read-lock and re-lookup the vma, for unmap we'd have to do a proper
downgrade so that there is no window where the virtual address area
could be re-allocated)

The big issue is that we'd have to split up do_munmap() into those two
phases, since right now callers take the write semaphore before
calling it, and drop it afterwards. And some callers do it in a loop.
But we should be fairly easily able to make the *common* case (ie
normal "munmap()") do something like

    down_write(&mm->mmap_sem);
    phase1_munmap(..);
    downgrade_write(&mm->mmap_sem);
    phase2_munmap(..);
    up_read(&mm->mmap_sem);

instead of what it does now (which is to just do
down_write()/up_write() around do_munmap()).

I don't see any fundamental problems, but maybe there's some really
annoying detail that makes this nasty (right now we do
"remove_vma_list() -> remove_vma()" *after* tearing down the page
tables, and since that calls the ->close function, I think it has to
be done that way. I'm wondering if any of that code relies on the
mmap_sem() being held for exclusively for writing. I don't see why it
possibly could, but..

So maybe I'm being overly optimistic and it's not as easy as just
splitting do_mmap() into two phases, but it really *looks* like it
might be just a ten-liner or so.. And if a real munmap() is the common
case (as opposed to a do_munmap() that gets triggered by somebody
doing a "mmap()" on top of an old mapping), then we'd at least allow
page faults from other threads to be done concurrently with tearing
down the page tables for the unmapped vma..

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ