linux-kernel - Re: [PATCHSET v5] Make background writeback great again for the first time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 3 May 2016 08:40:11 -0400
From:	Chris Mason <clm@...com>
To:	Jan Kara <jack@...e.cz>
CC:	Jens Axboe <axboe@...nel.dk>, <linux-kernel@...r.kernel.org>,
	<linux-fsdevel@...r.kernel.org>, <linux-block@...r.kernel.org>,
	<dchinner@...hat.com>, <sedat.dilek@...il.com>
Subject: Re: [PATCHSET v5] Make background writeback great again for the
 first time

On Tue, May 03, 2016 at 02:17:19PM +0200, Jan Kara wrote:
> On Thu 28-04-16 12:46:41, Jens Axboe wrote:
> > >>-	rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
> > >>-	rwb->wb_normal = (rwb->wb_max + 1) / 2;
> > >>-	rwb->wb_background = (rwb->wb_max + 3) / 4;
> > >>+	if (rwb->queue_depth == 1) {
> > >>+		rwb->wb_max = rwb->wb_normal = 2;
> > >>+		rwb->wb_background = 1;
> > >
> > >This breaks the detection of too big scale_step in scale_up() where we key
> > >of wb_max == 1 value. However even with that fixed no luck :(:
> > 
> > Yeah, I need to look at that. For QD=1, I think the only sensible values for
> > max/normal/bg is 2/2/1 and 1/1/1 if we step down.
> > 
> > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> > >Runtime: 105.126 107.125 105.641
> > >
> > >So about the same as before. I'll try to debug this later today...
> > 
> > Thanks, I'm very interested in what you find!
> 
> OK, so the reason was relatively standard in the end. I was using ext3 (or
> more exactly ext4 without delayed allocation) for the test. The throttling
> of background writes gave more priority to writes from the journalling
> thread which happen with WRITE_SYNC and thus are not throttled. Thus the
> journalling thread ended up having to do more data writeback to be able to
> commit a transaction (due to requirements of data=ordered mode) and it is
> less efficient at that than the normal flusher thread.
> 
> So this is an example where throttling background writeback effectively
> just pushes more work into another context which does it less efficiently
> and indirectly makes everyone wait for it. ext3 has been always sensitive to
> issues like this. ext4 is using delayed allocation and thus only data
> writes into holes end up being part of a transaction -> simple dd test case
> doesn't hit that path. And indeed when I repeat the same test with ext4,
> the numbers with and without your patch are exactly the same.
> 
> The question remains how common a pattern where throttling of background
> writeback delays also something else is. I'll schedule a couple of
> benchmarks to measure impact of your patches for a wider range of workloads
> (but sadly pretty limited set of hw). If ext3 is the only one seeing
> issues, I would be willing to accept that ext3 takes the hit since it is
> doing something rather stupid (but inherent in its journal design) and we
> have a way to deal with this either by enabling delayed allocation or by
> turning off the writeback throttling...

At least in the case of io that we know is going to be data=ordered, we
can bump the prio of those pages?

-chris