linux-kernel - Re: [PATCHSET v5] Make background writeback great again for the first time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 3 May 2016 14:17:19 +0200
From:	Jan Kara <jack@...e.cz>
To:	Jens Axboe <axboe@...nel.dk>
Cc:	Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
	dchinner@...hat.com, sedat.dilek@...il.com
Subject: Re: [PATCHSET v5] Make background writeback great again for the
 first time

On Thu 28-04-16 12:46:41, Jens Axboe wrote:
> >>-	rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
> >>-	rwb->wb_normal = (rwb->wb_max + 1) / 2;
> >>-	rwb->wb_background = (rwb->wb_max + 3) / 4;
> >>+	if (rwb->queue_depth == 1) {
> >>+		rwb->wb_max = rwb->wb_normal = 2;
> >>+		rwb->wb_background = 1;
> >
> >This breaks the detection of too big scale_step in scale_up() where we key
> >of wb_max == 1 value. However even with that fixed no luck :(:
> 
> Yeah, I need to look at that. For QD=1, I think the only sensible values for
> max/normal/bg is 2/2/1 and 1/1/1 if we step down.
> 
> >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> >Runtime: 105.126 107.125 105.641
> >
> >So about the same as before. I'll try to debug this later today...
> 
> Thanks, I'm very interested in what you find!

OK, so the reason was relatively standard in the end. I was using ext3 (or
more exactly ext4 without delayed allocation) for the test. The throttling
of background writes gave more priority to writes from the journalling
thread which happen with WRITE_SYNC and thus are not throttled. Thus the
journalling thread ended up having to do more data writeback to be able to
commit a transaction (due to requirements of data=ordered mode) and it is
less efficient at that than the normal flusher thread.

So this is an example where throttling background writeback effectively
just pushes more work into another context which does it less efficiently
and indirectly makes everyone wait for it. ext3 has been always sensitive to
issues like this. ext4 is using delayed allocation and thus only data
writes into holes end up being part of a transaction -> simple dd test case
doesn't hit that path. And indeed when I repeat the same test with ext4,
the numbers with and without your patch are exactly the same.

The question remains how common a pattern where throttling of background
writeback delays also something else is. I'll schedule a couple of
benchmarks to measure impact of your patches for a wider range of workloads
(but sadly pretty limited set of hw). If ext3 is the only one seeing
issues, I would be willing to accept that ext3 takes the hit since it is
doing something rather stupid (but inherent in its journal design) and we
have a way to deal with this either by enabling delayed allocation or by
turning off the writeback throttling...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR