lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 05 Dec 2006 17:52:02 +1100 From: Nick Piggin <nickpiggin@...oo.com.au> To: Linux Memory Management <linux-mm@...ck.org>, linux-fsdevel@...r.kernel.org, linux-kernel <linux-kernel@...r.kernel.org> CC: Mark Fasheh <mark.fasheh@...cle.com>, OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>, Andrew Morton <akpm@...gle.com> Subject: Status of buffered write path (deadlock fixes) Hi, I'd like to try to state where we are WRT the buffered write patches, and ask for comments. Sorry for the wide cc list, but this is an important issue which hasn't had enough review. Well the next -mm will include everything we've done so far. I won't repost patches unless someone would like to comment on a specific one. I think the core generic_file_buffered_write is fairly robust, after fixing the efault and zerolength iov problems picked up in testing (thanks, very helpful!). So now I *believe* we have an approach that solves the deadlock and doesn't expose transient or stale data, transient zeroes, or anything like that. Error handling is getting close, but there may be cases that nobody has picked up, and I've noticed a couple which I'll explain below. I think we do the right thing WRT pagecache error handling: a !uptodate page remains !uptodate, an uptodate page can handle the write being done in several parts. Comments in the patches attempt to explain how this works. I think it is pretty straightforward. But WRT block allocation in the case of errors, it needs more review. Block allocation: - prepare_write can allocate blocks - prepare_write doesn't need to initialize the pagecache on top of these blocks where it is within the range specified in prepare_write (because the copy_from_user will initialise it correctly) - In the case of a !uptodate page, unless the page is brought uptodate (ie the copy_from_user completely succeeds) and marked dirty, then a read that sneaks in after we unlock the page (to retry the write) will try to bring it uptodate by pulling in the uninitialised blocks. Problem 1: I think that allocating blocks outside i_size is OK WRT uninitialised data, because we update i_size only after a successful copy. However, I don't think we trim these blocks off (eg. perhaps the "prepare_write may have instantiated a few blocks" path should be the normal error path for both the copy_from_user and the commit_write error cases as well?) We allocate blocks within holes, but these don't need to be trimmed: it is enough to just zero out any new buffers. It might be nicer if we had some kind of way to punch a hole, but it is a rare corner case. Problem 2: nobh error handling[*]. We have just a single buffer that is used for each block in the prepare_write path, so the "zero new buffers" trick doesn't work. I think one solution to this could be to allocate all buffers for the page like normal, and then strip them off when commit_write succeeds? This would allow the zero_new_buffers path to work properly. [*] Actually I think there is a problem with the mainline nobh error handling in that a whole page of blocks will get zeroed on failure, even valid data that isn't being touched by the write. Finally, filesystems. Only OGAWA Hirofumi and Mark Fasheh have given much feedback so far. I've tried to grok ext2/3 and think they'll work OK, and have at least *looked* at all the rest. However in the worst case, there might be many subtle and different problems :( Filesystem developers need to review this, please. I don't want to cc every filesystem dev list, but if anybody thinks it would be helpful to forward this then please do. Well, that's about where its at. Block allocation problems 1 and 2 shouldn't be too hard to fix, but I would like confirmation / suggestions. Thanks, Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists