linux-kernel - Re: [PATCH] btrfs: Avoid getting stuck during cyclic writebacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20191008144221.GA18794@devbig004.ftw2.facebook.com>
Date:   Tue, 8 Oct 2019 07:42:21 -0700
From:   Tejun Heo <tj@...nel.org>
To:     dsterba@...e.cz, Chris Mason <clm@...com>,
        Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, kernel-team@...com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: Avoid getting stuck during cyclic writebacks

Hello,

On Tue, Oct 08, 2019 at 04:23:22PM +0200, David Sterba wrote:
> > 1. There is a single file which has accumulated enough dirty pages to
> >    trigger balance_dirty_pages() and the writer appending to the file
> >    with a series of short writes.
> > 
> > 2. bdp kicks in, wakes up background writeback and sleeps.
> 
> What does 'bdp' refer to?

Oh, Sorry about that.  balance_dirty_pages().

> > IOW, as long as it's not EOF and there's budget, the code never
> > retries writing back the same page.  Only when a page happens to be
> > the last page of a particular run, we end up retrying the page, which
> > can't possibly guarantee anything data integrity related.  Besides,
> > cyclic writes are only used for non-syncing writebacks meaning that
> > there's no data integrity implication to begin with.
> 
> The code was added in a91326679f2a0a4c239 ("Btrfs: make
> mapping->writeback_index point to the last written page") after a user
> report in https://www.spinics.net/lists/linux-btrfs/msg52628.html , slow
> appends that caused fragmentation

Ah, okay, that makes sense.  That prolly warrants some comments.

> What you describe as the cause is similar, but you're partially
> reverting the fix that was supposed to fix it. As there's more code
> added by the original patch, the revert won't probably bring back the
> bug.
> 
> The whole function and states are hard to follow, I agree with your
> reasoning about the check being bogus and overall I'd rather see fewer
> special cases in the function.
> 
> Also the removed comment mentions media errors but this was not the
> problem for the original report and is not a common scenario either. So
> as long as the fallback in such case is sane (ie. set done = 1 and
> exit), I don't see futher problems.
> 
> > Fix it by always setting done_index past the current page being
> > processed.
> > 
> > Note that this problem exists in other writepages too.
> 
> I can see that write_cache_pages does the same done_index updates.  So
> it makes sense that the page walking and writeback index tracking
> behaviour is consistent, unless extent_write_cache_pages has diverged
> too much.
> 
> I'll add the patch to misc-next. Thanks.

So, in case trying to write the last page is still needed, the thing
necessary to fix this deadlock is avoiding repeating on the last page
too many times / indefinitely as that's what ends up blocking forward
progress - almost all of dirty data is behind the cyclic cursor and
the cursor can't wrap back to get to them.

Thanks.

-- 
tejun