linux-kernel - Re: [RFC] new ->perform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 22 May 2010 00:23:54 +1000
From:	Nick Piggin <npiggin@...e.de>
To:	Josef Bacik <josef@...hat.com>
Cc:	Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
	linux-fsdevel@...r.kernel.org, chris.mason@...cle.com,
	hch@...radead.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] new ->perform_write fop

On Fri, May 21, 2010 at 09:50:54AM -0400, Josef Bacik wrote:
> On Fri, May 21, 2010 at 09:05:24AM +1000, Dave Chinner wrote:
> > Allocating multipage writes as unwritten extents turns off delayed
> > allocation and hence we'd lose all the benefits that this gives...
> >
> 
> I just realized we have another problem, the mmap_sem/page_lock deadlock.
> Currently BTRFS is susceptible to this, since we don't prefault any of the pages
> in yet.  If we're going to do multi-page writes we're going to need to have a
> way to fault in all of the iovec's at once, so when we do the
> pagefault_disable() copy pagefault_enable() we don't just end up copying the
> first iovec.  Nick have you done something like this already?  If not I assume
> I can just loop through all the iovec's and call fault_in_pages_readable on all
> of them and be good to go right?  Thanks,

Yes, well it's a different issue. With multi-page writes, even a single
iovec may not be faulted in properly. And with multiple iovecs, we are
already suboptimal with faulting.

faulting in multiple iovecs may already be a good idea. I didn't add
that code, I had hoped for a test case first, but perhaps we can just
go and add it.

With multipage writes, we would want to fault in multiple source pages
at once if the design was to lock multiple pages at once and do the
copy. I still think we might be able to just lock and copy one page at
a time, but I could be wrong.

Oh wow, btrfs is deadlocky there. Firstly, fault_in_pages_readable does
not guarantee success (race can always unmap the page in the meantime).
Secondly, calling it inside the page lock section just means it will
cause the deadlock rather than the copy_from_user.

Quick workaround to reduce probability is to do fault_in_pages_readable
calls before locking the pages.

But you really need to handle the short-copy case. From the error
handling there, it seems like you can just free_reserved_data_space and
retry the copy?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/