lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 28 Dec 2009 11:40:41 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"minchan.kim@...il.com" <minchan.kim@...il.com>,
	cl@...ux-foundation.org
Subject: Re: [RFC PATCH] asynchronous page fault.

On Mon, 2009-12-28 at 11:30 +0100, Peter Zijlstra wrote:
> On Mon, 2009-12-28 at 18:58 +0900, KAMEZAWA Hiroyuki wrote:
> > Peter Zijlstra さんは書きました:
> > > On Mon, 2009-12-28 at 09:36 +0900, KAMEZAWA Hiroyuki wrote:
> > >>
> > >> > The idea is to let the RCU lock span whatever length you need the vma
> > >> > for, the easy way is to simply use PREEMPT_RCU=y for now,
> > >>
> > >> I tried to remove his kind of reference count trick but I can't do that
> > >> without synchronize_rcu() somewhere in unmap code. I don't like that and
> > >> use this refcnt.
> > >
> > > Why, because otherwise we can access page tables for an already unmapped
> > > vma? Yeah that is the interesting bit ;-)
> > >
> > Without that
> >   vma->a_ops->fault()
> > and
> >   vma->a_ops->unmap()
> > can be called at the same time. and vma->vm_file can be dropped while
> > vma->a_ops->fault() is called. etc...
> 
> Right, so acquiring the PTE lock will either instantiate page tables for
> a non-existing vma, leaving you with an interesting mess to clean up, or
> you can also RCU free the page tables (in the same RCU domain as the
> vma) which will mostly[*] avoid that issue.
> 
> [ To make live really really interesting you could even re-use the
>   page-tables and abort the RCU free when the region gets re-mapped
>   before the RCU callbacks happen, this will avoid a free/alloc cycle
>   for fast remapping workloads. ]
> 
> Once you hold the PTE lock, you can validate the vma you looked up,
> since ->unmap() syncs against it. If at that time you find the
> speculative vma is dead, you fail and re-try the fault.
> 
> [*] there still is the case of faulting on an address that didn't
> previously have page-tables hence the unmap page table scan will have
> skipped it -- my hacks simply leaked page tables here, but the idea was
> to acquire the mmap_sem for reading and cleanup properly.

Alternatively, we could mark vma's dead in some way before we do the
unmap, then whenever we hit the page-table alloc path, we check against
the speculative vma and bail if it died.

That might just work.. will need to ponder it a bit more.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists