linux-kernel - Re: [PATCH] radix-tree: fix radix_tree_iter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 15 Jul 2016 11:52:58 +0300
From:	Andrey Ryabinin <aryabinin@...tuozzo.com>
To:	Ross Zwisler <ross.zwisler@...ux.intel.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	<linux-mm@...ck.org>, Greg Thelen <gthelen@...gle.com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	<syzkaller@...glegroups.com>, Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	<linux-kernel@...r.kernel.org>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Hugh Dickins <hughd@...gle.com>, <stable@...r.kernel.org>
Subject: Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged
 iterators.



On 07/15/2016 01:25 AM, Ross Zwisler wrote:
> On Thu, Jul 14, 2016 at 02:19:56PM +0300, Andrey Ryabinin wrote:
>> radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
>> Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
>> leading to crash:
>>
>> RIP: [<     inline     >] radix_tree_next_slot include/linux/radix-tree.h:473
>>   [<ffffffff816951a4>] find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
>> ....
>> Call Trace:
>>  [<ffffffff816cd91a>] pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
>>  [<ffffffff81ab4231>] mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
>>  [<ffffffff81ac883e>] ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
>>  [<ffffffff816c99c7>] do_writepages+0x97/0x100 mm/page-writeback.c:2364
>>  [<ffffffff8169bee8>] __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
>>  [<ffffffff8169c371>] filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
>>  [<ffffffff81aa584d>] ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
>>  [<ffffffff818b667a>] vfs_fsync_range+0x10a/0x250 fs/sync.c:195
>>  [<     inline     >] vfs_fsync fs/sync.c:209
>>  [<ffffffff818b6832>] do_fsync+0x42/0x70 fs/sync.c:219
>>  [<     inline     >] SYSC_fdatasync fs/sync.c:232
>>  [<ffffffff818b6f89>] SyS_fdatasync+0x19/0x20 fs/sync.c:230
>>  [<ffffffff86a94e00>] entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
>>
>> We must reset iterator's tags to bail out from radix_tree_next_slot() and
>> go to the slow-path in radix_tree_next_chunk().
> 
> This analysis doesn't make sense to me.  In find_get_pages_tag(), when we call
> radix_tree_iter_retry(), this sets the local 'slot' variable to NULL, then
> does a 'continue'.  This will hop to the next iteration of the
> radix_tree_for_each_tagged() loop, which will very check the exit condition of
> the for() loop:
> 
> #define radix_tree_for_each_tagged(slot, root, iter, start, tag)	\
> 	for (slot = radix_tree_iter_init(iter, start) ;			\
> 	     slot || (slot = radix_tree_next_chunk(root, iter,		\
> 			      RADIX_TREE_ITER_TAGGED | tag)) ;		\
> 	     slot = radix_tree_next_slot(slot, iter,			\
> 				RADIX_TREE_ITER_TAGGED))
> 
> So, we'll run the 
> 	     slot || (slot = radix_tree_next_chunk(root, iter,		\
> 			      RADIX_TREE_ITER_TAGGED | tag)) ;		\
> 
> bit first.  

This is not the way how the for() loop works. slot = radix_tree_next_slot() executed first
and only after that goes the condition statement.


> 'slot' is NULL, so we'll set it via radix_tree_next_chunk().  At
> this point radix_tree_next_slot() hasn't been called.
> 
> radix_tree_next_chunk() will set up the iter->index, iter->next_index and
> iter->tags before it returns.  The next iteration of the loop in
> find_get_pages_tag() will use the non-NULL slot provided by
> radix_tree_next_chunk(), and only after that iteration will we call
> radix_tree_next_slot() again.  By then iter->tags should be up to date.
> 
> Do you have a test setup that reliably fails without this code but passes when
> you zero out iter->tags?
> 


Yup, I run Dmitry's reproducer in a parallel loop:
	$ while true; do ./a.out & done

Usually it takes just couple minutes maximum.

> I've been looking at this as well, but haven't been able to get a reliable
> reproducer in my test setup.
> 
>> Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
>> Signed-off-by: Andrey Ryabinin <aryabinin@...tuozzo.com>
>> Reported-by: Dmitry Vyukov <dvyukov@...gle.com>
>> Cc: Konstantin Khlebnikov <koct9i@...il.com>
>> Cc: Matthew Wilcox <willy@...ux.intel.com>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: <stable@...r.kernel.org>
>> ---
>>  include/linux/radix-tree.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
>> index cb4b7e8..eca6f62 100644
>> --- a/include/linux/radix-tree.h
>> +++ b/include/linux/radix-tree.h
>> @@ -407,6 +407,7 @@ static inline __must_check
>>  void **radix_tree_iter_retry(struct radix_tree_iter *iter)
>>  {
>>  	iter->next_index = iter->index;
>> +	iter->tags = 0;
>>  	return NULL;
>>  }
>>  
>> -- 
>> 2.7.3
>>