linux-kernel - Re: [PATCH 7/8] wbt: add general throttling mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5728EA8A.9040405@kernel.dk>
Date:	Tue, 3 May 2016 12:14:34 -0600
From:	Jens Axboe <axboe@...nel.dk>
To:	Jan Kara <jack@...e.cz>, Jens Axboe <axboe@...com>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-block@...r.kernel.org, dchinner@...hat.com,
	sedat.dilek@...il.com
Subject: Re: [PATCH 7/8] wbt: add general throttling mechanism

On 05/03/2016 10:59 AM, Jens Axboe wrote:
> On 05/03/2016 09:48 AM, Jan Kara wrote:
>> On Tue 03-05-16 17:40:32, Jan Kara wrote:
>>> On Tue 03-05-16 11:34:10, Jan Kara wrote:
>>>> Yeah, once I'll hunt down that regression with old disk, I can have
>>>> a look
>>>> into how writeback throttling plays together with blkio-controller.
>>>
>>> So I've tried the following script (note that you need cgroup v2 for
>>> writeback IO to be throttled):
>>>
>>> ---
>>> mkdir /sys/fs/cgroup/group1
>>> echo 1000 >/sys/fs/cgroup/group1/io.weight
>>> dd if=/dev/zero of=/mnt/file1 bs=1M count=10000&
>>> DD1=$!
>>> echo $DD1 >/sys/fs/cgroup/group1/cgroup.procs
>>>
>>> mkdir /sys/fs/cgroup/group2
>>> echo 100 >/sys/fs/cgroup/group2/io.weight
>>> #echo "259:65536 wbps=5000000" >/sys/fs/cgroup/group2/io.max
>>> echo "259:65536 wbps=max" >/sys/fs/cgroup/group2/io.max
>>> dd if=/dev/zero of=/mnt/file2 bs=1M count=10000&
>>> DD2=$!
>>> echo $DD2 >/sys/fs/cgroup/group2/cgroup.procs
>>>
>>> while true; do
>>>          sleep 1
>>>          kill -USR1 $DD1
>>>          kill -USR1 $DD2
>>>          echo  '======================================================='
>>> done
>>> ---
>>>
>>> and watched the progress of the dd processes in different cgroups.
>>> The 1/10
>>> weight difference has no effect with your writeback patches - the
>>> situation
>>> after one minute:
>>>
>>> 3120+1 records in
>>> 3120+1 records out
>>> 3272392704 bytes (3.3 GB) copied, 63.7119 s, 51.4 MB/s
>>> 3217+1 records in
>>> 3217+1 records out
>>> 3374010368 bytes (3.4 GB) copied, 63.5819 s, 53.1 MB/s
>>>
>>> I should add that even without your patches the progress doesn't quite
>>> correspond to the weight ratio:
>>
>> Forgot to fill in corresponding data for unpatched kernel here:
>>
>> 5962+2 records in
>> 5962+2 records out
>> 6252281856 bytes (6.3 GB) copied, 64.1719 s, 97.4 MB/s
>> 1502+0 records in
>> 1502+0 records out
>> 1574961152 bytes (1.6 GB) copied, 64.207 s, 24.5 MB/s
>
> Thanks for testing this, I'll see what we can do about that. It stands
> to reason that we'll throttle a heavier writer more, statistically. But
> I'm assuming this above test was run basically with just the writes
> going, so no real competition? And hence we end up throttling them
> equally much, destroying the weighting in the process. But for both
> cases, we basically don't pay any attention to cgroup weights.
>
>>> but still there is noticeable difference to cgroups with different
>>> weights.
>>>
>>> OTOH blk-throttle combines well with your patches: Limiting one
>>> cgroup to
>>> 5 M/s results in numbers like:
>>>
>>> 3883+2 records in
>>> 3883+2 records out
>>> 4072091648 bytes (4.1 GB) copied, 36.6713 s, 111 MB/s
>>> 413+0 records in
>>> 413+0 records out
>>> 433061888 bytes (433 MB) copied, 36.8939 s, 11.7 MB/s
>>>
>>> which is fine and comparable with unpatched kernel. Higher throughput
>>> number is because we do buffered writes and dd reports what it wrote
>>> into
>>> page cache. And there is no wonder blk-throttle combines fine - it
>>> throttles bios which happens before we reach writeback throttling
>>> mechanism.
>
> OK, that's good, at least that part works fine. And yes, the throttle
> path is hit before we end up in the make_request_fn, which is where wbt
> drops in.
>
>>> So I belive this demonstrates that your writeback throttling just
>>> doesn't
>>> work well with selective scheduling policy that happens below it
>>> because it
>>> can essentially lead to IO priority inversion issues...
>
> It this testing still done on the QD=1 ATA disk? Not too surprising that
> this falls apart, since we have very little room to maneuver. I wonder
> if a normal SATA with NCQ would behave better in this regard. I'll have
> to test a bit and think about how we can best handle this case.

I think what we'll do for now is just disable wbt IFF we have a non-root 
cgroup attached to CFQ. Done here:

http://git.kernel.dk/cgit/linux-block/commit/?h=wb-buf-throttle&id=7315756efe76bbdf83076fc9dbc569bbb4da5d32

We don't have a strong need for wbt (supposedly) since CFQ should take 
care of most of it, if you have policies set for proportional sharing.

Longer term it's not a concern either, as we'll move away from that 
model anyway.

-- 
Jens Axboe