linux-kernel - Re: [PATCH btrfs/for-next] btrfs: fix fatal extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL3q7H4GqjcesWir69PULr4a9rSi2cVQ366y5bAEU23wS6xJkA@mail.gmail.com>
Date:   Wed, 17 Jun 2020 19:26:27 +0100
From:   Filipe Manana <fdmanana@...il.com>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Chris Mason <clm@...com>, Boris Burkov <boris@....io>,
        David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        kernel-team@...com
Subject: Re: [PATCH btrfs/for-next] btrfs: fix fatal extent_buffer readahead
 vs releasepage race

On Wed, Jun 17, 2020 at 7:19 PM Josef Bacik <josef@...icpanda.com> wrote:
>
> On 6/17/20 2:11 PM, Filipe Manana wrote:
> > On Wed, Jun 17, 2020 at 6:43 PM Chris Mason <clm@...com> wrote:
> >>
> >> On 17 Jun 2020, at 13:20, Filipe Manana wrote:
> >>
> >>> On Wed, Jun 17, 2020 at 5:32 PM Boris Burkov <boris@....io> wrote:
> >>>
> >>>> ---
> >>>>   fs/btrfs/extent_io.c | 45
> >>>> ++++++++++++++++++++++++++++----------------
> >>>>   1 file changed, 29 insertions(+), 16 deletions(-)
> >>>>
> >>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> >>>> index c59e07360083..f6758ebbb6a2 100644
> >>>> --- a/fs/btrfs/extent_io.c
> >>>> +++ b/fs/btrfs/extent_io.c
> >>>> @@ -3927,6 +3927,11 @@ static noinline_for_stack int
> >>>> write_one_eb(struct extent_buffer *eb,
> >>>>          clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
> >>>>          num_pages = num_extent_pages(eb);
> >>>>          atomic_set(&eb->io_pages, num_pages);
> >>>> +       /*
> >>>> +        * It is possible for releasepage to clear the TREE_REF bit
> >>>> before we
> >>>> +        * set io_pages. See check_buffer_tree_ref for a more
> >>>> detailed comment.
> >>>> +        */
> >>>> +       check_buffer_tree_ref(eb);
> >>>
> >>> This is a whole different case from the one described in the
> >>> changelog, as this is in the write path.
> >>> Why do we need this one?
> >>
> >> This was Josef’s idea, but I really like the symmetry.  You set
> >> io_pages, you do the tree_ref dance.  Everyone fiddling with the write
> >> back bit right now correctly clears writeback after doing the atomic_dec
> >> on io_pages, but the race is tiny and prone to getting exposed again by
> >> shifting code around.  Tree ref checks around io_pages are the most
> >> reliable way to prevent this bug from coming back again later.
> >
> > Ok, but that still doesn't answer my question.
> > Is there an actual race/problem this hunk solves?
> >
> > Before calling write_one_eb() we increment the ref on the eb and we
> > also call lock_extent_buffer_for_io(),
> > which clears the dirty bit and sets the writeback bit on the eb while
> > holding its ref_locks spin_lock.
> >
> > Even if we get to try_release_extent_buffer, it calls
> > extent_buffer_under_io(eb) while holding the ref_locks spin_lock,
> > so at any time it should return true, as either the dirty or the
> > writeback bit is set.
> >
> > Is this purely a safety guard that is being introduced?
> >
> > Can we at least describe in the changelog why we are adding this hunk
> > in the write path?
> > All it mentions is a race between reading and releasing pages, there's
> > nothing mentioned about races with writeback.
> >
>
> I think maybe we make that bit a separate patch, and leave the panic fix on it's
> own.  I suggested this because I have lofty ideas of killing the refs_lock, and
> this would at least keep us consistent in our usage TREE_REF to save us from
> freeing stuff that may be currently under IO.
>
> I'm super not happy with our reference handling coupled with releasepage, but
> these are the kind of hoops we're going to have to jump through until we have
> some better infrastructure in place to handle metadata.  Thanks,

Ok, so it's just a safety guard then.
Yes, either a separate patch or at the very least mention why we are
adding that in the change log.

Thanks.

>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”