lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070209175017.c8a8c500.akpm@linux-foundation.org>
Date:	Fri, 9 Feb 2007 17:50:17 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Linux Filesystems <linux-fsdevel@...r.kernel.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [patch 3/3] ext2: use perform_write aop

On Sat, 10 Feb 2007 02:34:07 +0100
Nick Piggin <npiggin@...e.de> wrote:

> On Fri, Feb 09, 2007 at 11:45:39AM -0800, Andrew Morton wrote:
> > On Fri, 9 Feb 2007 11:14:55 -0800 Andrew Morton <akpm@...ux-foundation.org> wrote:
> > 
> > > If so, that might be preventable by leaving the buffer nonuptodate.
> > 
> > oh, OK, it was buffer_new(), so zeroes are the right thing for a reader to
> > see.
> > 
> > But if it wasn't buffer_new() then the appropriate thing for the reader to
> > see is what's on the disk.  But __block_prepare_write() won't read a buffer
> > which is fully-inside the write area from disk.
> > 
> > And that's seemingly OK, because if a reader gets in there after the short
> > copy, that reader will see the non-uptodate buffer and will populate it
> > from disk.
> > 
> > But doing that will overwrite the data which the write() caller managed to
> > copy into the page before it took a fault.  And that's not OK because
> > block_perform_write() does iovec_iterator_advance(i, copied) in this case
> > and hence will not rerun the copy after acquiring the page lock?
> 
> Hmm, yeah. This can be handled by not advancing partially into a !uptodate
> buffer.

Think so, yeah.

Overall, the implementation you have there seems reasonable to me. 
Basically it's passing the responsibility for preventing the deadlock and
the exposure-of-zeroes problem down into the filesystem itself, where we
have visibility of the state of the various subsections of the page and can
take appropriate actions in response to that.

It's got conceptually harder to follow as a result, which is a shame.  But
still no magic bullet is on offer.

I pity the poor schmuck who has to write ext3_journalled_perform_write(),
ext3_ordered_perform_write(), ext3_writeback_perform_write(),
ext3_writeback_nobh_perform_write() and all that other stuff.  But I think
we need to do that pretty soon to validate the whole approach.  Also xfs
and reiser3.

NTFS will be interesting from the can-this-be-made-to-work POV.

Is NFS vulnerable to the deadlock?  It looks to be.  Shudder.

We'd need to find a way of communicating all this to the poor old
fs maintainers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ