linux-kernel - Re: [PATCH btrfs/for-next] btrfs: fix fatal extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1fdc19e9-8517-efb3-78c7-6f4d0152d87f@toxicpanda.com>
Date:   Wed, 17 Jun 2020 14:19:27 -0400
From:   Josef Bacik <josef@...icpanda.com>
To:     fdmanana@...il.com, Chris Mason <clm@...com>
Cc:     Boris Burkov <boris@....io>, David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        kernel-team@...com
Subject: Re: [PATCH btrfs/for-next] btrfs: fix fatal extent_buffer readahead
 vs releasepage race

On 6/17/20 2:11 PM, Filipe Manana wrote:
> On Wed, Jun 17, 2020 at 6:43 PM Chris Mason <clm@...com> wrote:
>>
>> On 17 Jun 2020, at 13:20, Filipe Manana wrote:
>>
>>> On Wed, Jun 17, 2020 at 5:32 PM Boris Burkov <boris@....io> wrote:
>>>
>>>> ---
>>>>   fs/btrfs/extent_io.c | 45
>>>> ++++++++++++++++++++++++++++----------------
>>>>   1 file changed, 29 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index c59e07360083..f6758ebbb6a2 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -3927,6 +3927,11 @@ static noinline_for_stack int
>>>> write_one_eb(struct extent_buffer *eb,
>>>>          clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
>>>>          num_pages = num_extent_pages(eb);
>>>>          atomic_set(&eb->io_pages, num_pages);
>>>> +       /*
>>>> +        * It is possible for releasepage to clear the TREE_REF bit
>>>> before we
>>>> +        * set io_pages. See check_buffer_tree_ref for a more
>>>> detailed comment.
>>>> +        */
>>>> +       check_buffer_tree_ref(eb);
>>>
>>> This is a whole different case from the one described in the
>>> changelog, as this is in the write path.
>>> Why do we need this one?
>>
>> This was Josef’s idea, but I really like the symmetry.  You set
>> io_pages, you do the tree_ref dance.  Everyone fiddling with the write
>> back bit right now correctly clears writeback after doing the atomic_dec
>> on io_pages, but the race is tiny and prone to getting exposed again by
>> shifting code around.  Tree ref checks around io_pages are the most
>> reliable way to prevent this bug from coming back again later.
> 
> Ok, but that still doesn't answer my question.
> Is there an actual race/problem this hunk solves?
> 
> Before calling write_one_eb() we increment the ref on the eb and we
> also call lock_extent_buffer_for_io(),
> which clears the dirty bit and sets the writeback bit on the eb while
> holding its ref_locks spin_lock.
> 
> Even if we get to try_release_extent_buffer, it calls
> extent_buffer_under_io(eb) while holding the ref_locks spin_lock,
> so at any time it should return true, as either the dirty or the
> writeback bit is set.
> 
> Is this purely a safety guard that is being introduced?
> 
> Can we at least describe in the changelog why we are adding this hunk
> in the write path?
> All it mentions is a race between reading and releasing pages, there's
> nothing mentioned about races with writeback.
> 

I think maybe we make that bit a separate patch, and leave the panic fix on it's 
own.  I suggested this because I have lofty ideas of killing the refs_lock, and 
this would at least keep us consistent in our usage TREE_REF to save us from 
freeing stuff that may be currently under IO.

I'm super not happy with our reference handling coupled with releasepage, but 
these are the kind of hoops we're going to have to jump through until we have 
some better infrastructure in place to handle metadata.  Thanks,

Josef