[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fdc19e9-8517-efb3-78c7-6f4d0152d87f@toxicpanda.com>
Date: Wed, 17 Jun 2020 14:19:27 -0400
From: Josef Bacik <josef@...icpanda.com>
To: fdmanana@...il.com, Chris Mason <clm@...com>
Cc: Boris Burkov <boris@....io>, David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
kernel-team@...com
Subject: Re: [PATCH btrfs/for-next] btrfs: fix fatal extent_buffer readahead
vs releasepage race
On 6/17/20 2:11 PM, Filipe Manana wrote:
> On Wed, Jun 17, 2020 at 6:43 PM Chris Mason <clm@...com> wrote:
>>
>> On 17 Jun 2020, at 13:20, Filipe Manana wrote:
>>
>>> On Wed, Jun 17, 2020 at 5:32 PM Boris Burkov <boris@....io> wrote:
>>>
>>>> ---
>>>> fs/btrfs/extent_io.c | 45
>>>> ++++++++++++++++++++++++++++----------------
>>>> 1 file changed, 29 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index c59e07360083..f6758ebbb6a2 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -3927,6 +3927,11 @@ static noinline_for_stack int
>>>> write_one_eb(struct extent_buffer *eb,
>>>> clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
>>>> num_pages = num_extent_pages(eb);
>>>> atomic_set(&eb->io_pages, num_pages);
>>>> + /*
>>>> + * It is possible for releasepage to clear the TREE_REF bit
>>>> before we
>>>> + * set io_pages. See check_buffer_tree_ref for a more
>>>> detailed comment.
>>>> + */
>>>> + check_buffer_tree_ref(eb);
>>>
>>> This is a whole different case from the one described in the
>>> changelog, as this is in the write path.
>>> Why do we need this one?
>>
>> This was Josef’s idea, but I really like the symmetry. You set
>> io_pages, you do the tree_ref dance. Everyone fiddling with the write
>> back bit right now correctly clears writeback after doing the atomic_dec
>> on io_pages, but the race is tiny and prone to getting exposed again by
>> shifting code around. Tree ref checks around io_pages are the most
>> reliable way to prevent this bug from coming back again later.
>
> Ok, but that still doesn't answer my question.
> Is there an actual race/problem this hunk solves?
>
> Before calling write_one_eb() we increment the ref on the eb and we
> also call lock_extent_buffer_for_io(),
> which clears the dirty bit and sets the writeback bit on the eb while
> holding its ref_locks spin_lock.
>
> Even if we get to try_release_extent_buffer, it calls
> extent_buffer_under_io(eb) while holding the ref_locks spin_lock,
> so at any time it should return true, as either the dirty or the
> writeback bit is set.
>
> Is this purely a safety guard that is being introduced?
>
> Can we at least describe in the changelog why we are adding this hunk
> in the write path?
> All it mentions is a race between reading and releasing pages, there's
> nothing mentioned about races with writeback.
>
I think maybe we make that bit a separate patch, and leave the panic fix on it's
own. I suggested this because I have lofty ideas of killing the refs_lock, and
this would at least keep us consistent in our usage TREE_REF to save us from
freeing stuff that may be currently under IO.
I'm super not happy with our reference handling coupled with releasepage, but
these are the kind of hoops we're going to have to jump through until we have
some better infrastructure in place to handle metadata. Thanks,
Josef
Powered by blists - more mailing lists