linux-kernel - Re: + fs-introduce-write_begin-write_end-and-perform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070613134005.GA13815@localhost.sw.ru>
Date:	Wed, 13 Jun 2007 17:40:05 +0400
From:	Dmitriy Monakhov <dmonakhov@...nvz.org>
To:	linux-kernel@...r.kernel.org
Cc:	npiggin@...e.de, mark.fasheh@...cle.com, dmonakhov@...nvz.org,
	linux-ext4@...r.kernel.org
Subject: Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

On 14:19 Втр 29 Май     , akpm@...ux-foundation.org wrote:
> 
> The patch titled
>      fs: introduce write_begin, write_end, and perform_write aops
> has been added to the -mm tree.  Its filename is
>      fs-introduce-write_begin-write_end-and-perform_write-aops.patch
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this
> 
> ------------------------------------------------------
> Subject: fs: introduce write_begin, write_end, and perform_write aops
> From: Nick Piggin <npiggin@...e.de>
> 
> These are intended to replace prepare_write and commit_write with more
> flexible alternatives that are also able to avoid the buffered write
> deadlock problems efficiently (which prepare_write is unable to do).
> 
> [mark.fasheh@...cle.com: API design contributions, code review and fixes]
> Signed-off-by: Nick Piggin <npiggin@...e.de>
> Signed-off-by: Mark Fasheh <mark.fasheh@...cle.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
I've finaly find time to review Nick's "write_begin/write_end aop" patch set.
And i have some fixes and questions. My be i've missed somthing and it was 
already disscussed, but i cant find in LKML.

1) loop dev:
	loop.c code itself is not perfect. In fact before Nick's patch
	partial write was't possible. Assumption what write chunks are
	always page aligned is realy weird ( see "index++" line).
	Fixed by "new aop loop fix" patch

2)block_write_begin:
	After we enter to block_write_begin with *pagep == NULL and
	some page was grabed we remember this page in *pagep
	And if __block_prepare_write() we have to clear *pagep , as 
	it was before. Because this may confuse caller.
	for example caller may have folowing code:
		ret = block_write_begin(..., pagep,...)
		if (ret && *pagep != NULL) {
			unlock_page(*pagep);
			page_cache_release(*pagep);
		}
	Fixed my "new aop block_write_begin fix" patch

3) __page_symlink:
	Nick's patch add folowing code:
	+ err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
	+                 AOP_FLAG_UNINTERRUPTIBLE, &page,&fsdata);
	symlink now consume whole page. I have only one question "WHY???".
	I don't see any advantages, but where are huge list of
	dissdvantages:
	a) fs with blksize == 1k and pagesize == 16k after this patch
	   waste more than 10x times disk space for nothing.
	b) What happends if we want use fs with blksize == 4k on i386
	   after it was used by ia64 ??? (before this patch it was
	   possible).
	
	I dont prepare patch for this because i dont understand issue
	witch Nick aimed to fix.

4) iov_iter_fault_in_readable:
	Function prerform check for signgle region, with out respect to
	segment nature of iovec, For example writev no longer works :) :
	writev(3, [{"\1", 1}, {"\2"..., 4096}], 2) = -1 EFAULT (Bad address)
	this hidden bug, and it was invisiable because XXXX_fault_in_readable
	return value was ignored before. Lets iov_iter_fault_in_readable
	perform checks for all segments.
	Fixed by :"iov_iter_fault_in_readable fix"

5) ext3_write_end:
	Before  write_begin/write_end patch set we have folowing locking
	order:
		stop_journal(handle);
		unlock_page(page);
	But now order is oposite:
		unlock_page(page);
		stop_journal(handle);
	Can we got any race condition now? I'm not sure is it actual problem,
	may be somebody cant describe this.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/