[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H4GqjcesWir69PULr4a9rSi2cVQ366y5bAEU23wS6xJkA@mail.gmail.com>
Date: Wed, 17 Jun 2020 19:26:27 +0100
From: Filipe Manana <fdmanana@...il.com>
To: Josef Bacik <josef@...icpanda.com>
Cc: Chris Mason <clm@...com>, Boris Burkov <boris@....io>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
kernel-team@...com
Subject: Re: [PATCH btrfs/for-next] btrfs: fix fatal extent_buffer readahead
vs releasepage race
On Wed, Jun 17, 2020 at 7:19 PM Josef Bacik <josef@...icpanda.com> wrote:
>
> On 6/17/20 2:11 PM, Filipe Manana wrote:
> > On Wed, Jun 17, 2020 at 6:43 PM Chris Mason <clm@...com> wrote:
> >>
> >> On 17 Jun 2020, at 13:20, Filipe Manana wrote:
> >>
> >>> On Wed, Jun 17, 2020 at 5:32 PM Boris Burkov <boris@....io> wrote:
> >>>
> >>>> ---
> >>>> fs/btrfs/extent_io.c | 45
> >>>> ++++++++++++++++++++++++++++----------------
> >>>> 1 file changed, 29 insertions(+), 16 deletions(-)
> >>>>
> >>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >>>> index c59e07360083..f6758ebbb6a2 100644
> >>>> --- a/fs/btrfs/extent_io.c
> >>>> +++ b/fs/btrfs/extent_io.c
> >>>> @@ -3927,6 +3927,11 @@ static noinline_for_stack int
> >>>> write_one_eb(struct extent_buffer *eb,
> >>>> clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
> >>>> num_pages = num_extent_pages(eb);
> >>>> atomic_set(&eb->io_pages, num_pages);
> >>>> + /*
> >>>> + * It is possible for releasepage to clear the TREE_REF bit
> >>>> before we
> >>>> + * set io_pages. See check_buffer_tree_ref for a more
> >>>> detailed comment.
> >>>> + */
> >>>> + check_buffer_tree_ref(eb);
> >>>
> >>> This is a whole different case from the one described in the
> >>> changelog, as this is in the write path.
> >>> Why do we need this one?
> >>
> >> This was Josef’s idea, but I really like the symmetry. You set
> >> io_pages, you do the tree_ref dance. Everyone fiddling with the write
> >> back bit right now correctly clears writeback after doing the atomic_dec
> >> on io_pages, but the race is tiny and prone to getting exposed again by
> >> shifting code around. Tree ref checks around io_pages are the most
> >> reliable way to prevent this bug from coming back again later.
> >
> > Ok, but that still doesn't answer my question.
> > Is there an actual race/problem this hunk solves?
> >
> > Before calling write_one_eb() we increment the ref on the eb and we
> > also call lock_extent_buffer_for_io(),
> > which clears the dirty bit and sets the writeback bit on the eb while
> > holding its ref_locks spin_lock.
> >
> > Even if we get to try_release_extent_buffer, it calls
> > extent_buffer_under_io(eb) while holding the ref_locks spin_lock,
> > so at any time it should return true, as either the dirty or the
> > writeback bit is set.
> >
> > Is this purely a safety guard that is being introduced?
> >
> > Can we at least describe in the changelog why we are adding this hunk
> > in the write path?
> > All it mentions is a race between reading and releasing pages, there's
> > nothing mentioned about races with writeback.
> >
>
> I think maybe we make that bit a separate patch, and leave the panic fix on it's
> own. I suggested this because I have lofty ideas of killing the refs_lock, and
> this would at least keep us consistent in our usage TREE_REF to save us from
> freeing stuff that may be currently under IO.
>
> I'm super not happy with our reference handling coupled with releasepage, but
> these are the kind of hoops we're going to have to jump through until we have
> some better infrastructure in place to handle metadata. Thanks,
Ok, so it's just a safety guard then.
Yes, either a separate patch or at the very least mention why we are
adding that in the change log.
Thanks.
>
> Josef
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
Powered by blists - more mailing lists