lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180509150938.GA3814@linux.intel.com>
Date:   Wed, 9 May 2018 09:09:38 -0600
From:   Ross Zwisler <ross.zwisler@...ux.intel.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, Matthew Wilcox <willy@...radead.org>,
        Christoph Hellwig <hch@....de>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Chinner <david@...morbit.com>, linux-nvdimm@...ts.01.org,
        stable@...r.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race

On Wed, May 09, 2018 at 02:46:11PM +0200, Jan Kara wrote:
> On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> > Fix a race in the multi-order iteration code which causes the kernel to hit
> > a GP fault.  This was first seen with a production v4.15 based kernel
> > (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> > DAX entries.
> > 
> > The race has to do with how we tear down multi-order sibling entries when
> > we are removing an item from the tree.  Remember for example that an order
> > 2 entry looks like this:
> > 
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> > 
> > where 'entry' is in some slot in the struct radix_tree_node, and the three
> > slots following 'entry' contain sibling pointers which point back to
> > 'entry.'
> > 
> > When we delete 'entry' from the tree, we call :
> >   radix_tree_delete()
> >     radix_tree_delete_item()
> >       __radix_tree_delete()
> >         replace_slot()
> > 
> > replace_slot() first removes the siblings in order from the first to the
> > last, then at then replaces 'entry' with NULL.  This means that for a brief
> > period of time we end up with one or more of the siblings removed, so:
> > 
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> > 
> > This causes an issue if you have a reader iterating over the slots in the
> > tree via radix_tree_for_each_slot() while only under
> > rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
> > mm/filemap.c.
> > 
> > The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> > skip over the sibling entries in the slots, it currently does so with an
> > exact match on the slot directly preceding our current slot.  Normally this
> > works:
> >                                     V preceding slot
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> >                                             ^ current slot
> > 
> > This lets you find the first sibling, and you skip them all in order.
> > 
> > But in the case where one of the siblings is NULL, that slot is skipped and
> > then our sibling detection is interrupted:
> > 
> >                                            V preceding slot
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> >                                                   ^ current slot
> > 
> > This means that the sibling pointers aren't recognized since they point all
> > the way back to 'entry', so we think that they are normal internal radix
> > tree pointers.  This causes us to think we need to walk down to a struct
> > radix_tree_node starting at the address of 'entry'.
> > 
> > In a real running kernel this will crash the thread with a GP fault when
> > you try and dereference the slots in your broken node starting at 'entry'.
> > 
> > We fix this race by fixing the way that skip_siblings() detects sibling
> > nodes.  Instead of testing against the preceding slot we instead look for
> > siblings via is_sibling_entry() which compares against the position of the
> > struct radix_tree_node.slots[] array.  This ensures that sibling entries
> > are properly identified, even if they are no longer contiguous with the
> > 'entry' they point to.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
> > Reported-by: CR, Sapthagirish <sapthagirish.cr@...el.com>
> > Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> > Cc: <stable@...r.kernel.org>
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@...e.cz>

Thank you for the review, Jan.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ