lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO2mnozhTeuWwxsmQZor93xmyS5_r34YpGXy0ec=9vFCBGbO1w@mail.gmail.com>
Date:	Wed, 16 Sep 2015 17:08:12 -0400
From:	Denis Bychkov <manover@...il.com>
To:	Kent Overstreet <kent.overstreet@...il.com>
Cc:	Vojtech Pavlik <vojtech@...e.com>, linux-kernel@...r.kernel.org,
	linux-bcache@...r.kernel.org, Kent Overstreet <kmo@...erainc.com>,
	Emmanuel Florac <eflorac@...ellique.com>,
	Jiri Kosina <jkosina@...e.com>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH] bcache: Fix writeback_thread never writing back
 incomplete stripes.

Hi Kent, Vojtech, list

Your fix works perfectly, I can finally remove my patch that disables
the partial_stripes_expensive branch and unleash the full bcache
performance on my RAID-6 array. I was aware of this problem for some
time now, but never got to learning the bcache codebase enough to find
out why this happens with partial stripes write-back enabled, so I
just disabled it to avoid CPU spinning. Thanks a lot to Vojtech for
finding the reason and to you for the patch. I have quite a collection
of stability patches for the mainline bcache. Would you please be kind
to review them when convenient and merge upstream the ones you think
are worth merging. It will be 4.4 merge window, I believe.
Please find all the patches attached. All of them are rebased on
mainline 4.2 kernel. If somebody needs the 4.1.x versions, please let
me know.


On Wed, Sep 16, 2015 at 7:32 AM, Kent Overstreet
<kent.overstreet@...il.com> wrote:
> On Sat, Sep 05, 2015 at 01:10:12PM +0200, Vojtech Pavlik wrote:
>> Fix writeback_thread never finishing writing back all dirty data in bcache when
>> partial_stripes_expensive is set, and spinning, consuming 100% of CPU instead.
>>
>> Signed-off-by: Vojtech Pavlik <vojtech@...e.com>
>> ---
>>
>> This is a fix for the current upstream bcache, not the devel branch.
>>
>> If partial_stripes_expensive is set for a cache set, then writeback_thread
>> always attempts to write full stripes out back to the backing device first.
>> However, since it does that based on a bitmap and not a simple linear
>> search, like the rest of the code of refill_dirty(), it changes the
>> last_scanned pointer so that never points to 'end'. refill_dirty() then
>> never tries to scan from 'start', resulting in the writeback_thread
>> looping, consuming 100% of CPU, but never making progress in writing out
>> the incomplete dirty stripes in the cache.
>>
>> Scanning the tree after not finding enough full stripes fixes the issue.
>>
>> Incomplete dirty stripes are written to the backing device, the device
>> eventually reaches a clean state if there is nothing dirtying data and
>> writeback_thread sleeps. This also fixes the problem of the cache device
>> not being possible to detach in the partial_stripes_expensive scenario.
>
> Good catch!
>
>> It may be more efficient to separate the last_scanned field for normal and
>> stripe scans instead.
>
> Actually, I think I like your approach - I think it gives us the behaviour we
> want, although it took some thinking to figure out why.
>
> One of the reasons for last_scanned is just to do writeback in LBA order and
> avoid making the disks seek around - so, if refill_full_stripes() did just queue
> up some IO at last_scanned, we do want to keep scanning from that position for
> non full stripes.
>
> But since (as you noted) last_scanned is never going to be >= end after calling
> refill_full_strips() (if there were any full stripes) - really the correct thing
> to do is loop around to the start if necessary so that we can successfully scan
> everything.
>
> The only inefficiency I see with your approach is that on the second scan, after
> we've looped, we don't need to scan to the end of the disk - we only need to
> scan up to where we initially started.
>
> But, another observation - if we change refill_dirty() so that it always scans
> the entire keyspace if necessary, regardless of where last_scanned was - well,
> that really isn't specific to the refill_full_stripes() case - we can just
> always do that.
>
> Can you give this patch a try?
>
> -- >8 --
> Subject: [PATCH] bcache: Change refill_dirty() to always scan entire disk if necessary
>
> Previously, it would only scan the entire disk if it was starting from the very
> start of the disk - i.e. if the previous scan got to the end.
>
> This was broken by refill_full_stripes(), which updates last_scanned so that
> refill_dirty was never triggering the searched_from_start path.
>
> But if we change refill_dirty() to always scan the entire disk if necessary,
> regardless of what last_scanned was, the code gets cleaner and we fix that bug
> too.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@...il.com>
> ---
>  drivers/md/bcache/writeback.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index cdde0f32f0..08a52db38b 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -359,11 +359,13 @@ next:
>         }
>  }
>
> +/*
> + * Returns true if we scanned the entire disk
> + */
>  static bool refill_dirty(struct cached_dev *dc)
>  {
>         struct keybuf *buf = &dc->writeback_keys;
> -       struct bkey end = KEY(dc->disk.id, MAX_KEY_OFFSET, 0);
> -       bool searched_from_start = false;
> +       struct bkey start_pos, end = KEY(dc->disk.id, MAX_KEY_OFFSET, 0);
>
>         if (dc->partial_stripes_expensive) {
>                 refill_full_stripes(dc);
> @@ -371,14 +373,20 @@ static bool refill_dirty(struct cached_dev *dc)
>                         return false;
>         }
>
> -       if (bkey_cmp(&buf->last_scanned, &end) >= 0) {
> -               buf->last_scanned = KEY(dc->disk.id, 0, 0);
> -               searched_from_start = true;
> -       }
> -
> +       start_pos = buf->last_scanned;
>         bch_refill_keybuf(dc->disk.c, buf, &end, dirty_pred);
>
> -       return bkey_cmp(&buf->last_scanned, &end) >= 0 && searched_from_start;
> +       if (bkey_cmp(&buf->last_scanned, &end) < 0)
> +               return false;
> +
> +       /*
> +        * If we get to the end start scanning again from the beginning, and
> +        * only scan up to where we initially started scanning from:
> +        */
> +       buf->last_scanned = KEY(dc->disk.id, 0, 0);
> +       bch_refill_keybuf(dc->disk.c, buf, &start_pos, dirty_pred);
> +
> +       return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
>  }
>
>  static void bch_writeback(struct cached_dev *dc)
> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

Denis

Download attachment "bcache_correct_errors_on_register.patch" of type "application/octet-stream" (861 bytes)

Download attachment "bcache-attach-detach-cleanup.patch" of type "application/octet-stream" (5134 bytes)

Download attachment "bcache-cond_resched.patch" of type "application/octet-stream" (735 bytes)

Download attachment "bcache-fix_passthrough_shutdown_crash.patch" of type "application/octet-stream" (1044 bytes)

Download attachment "bcache-fix-memleak-bch_cached_dev_run.patch" of type "application/octet-stream" (802 bytes)

Download attachment "bcache-rcu-sched-bugfix.patch" of type "application/octet-stream" (2490 bytes)

Download attachment "bcache-refill_dirty_to_always_scan_entire_disk.patch" of type "application/octet-stream" (1963 bytes)

Download attachment "bcache-remove-bio-splitting.patch" of type "application/octet-stream" (10266 bytes)

Download attachment "bcache-unregister-reboot-notifier.patch" of type "application/octet-stream" (761 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ