lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2d291e6b6ed43d89930eb2a7d459ff8@kuaishou.com>
Date: Mon, 22 Apr 2024 03:33:58 +0000
From: 周泰宇 <zhoutaiyu@...ishou.com>
To: Yu Kuai <yukuai1@...weicloud.com>, "tj@...nel.org" <tj@...nel.org>
CC: "josef@...icpanda.com" <josef@...icpanda.com>, "axboe@...nel.dk"
	<axboe@...nel.dk>, "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "yukuai (C)"
	<yukuai3@...wei.com>
Subject: Re: Re: [PATCH] blk-throttle: fix repeat limit on bio with
 BIO_BPS_THROTTLED

Hi,

>> Test scrips:
>> cgpath=/sys/fs/cgroup/blkio/test0
>> mkdir -p $cgpath
>> echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device
>> echo "8:16 100000" > $cgpath/blkio.throttle.write_iops_device

> What? 8:0 and 8:16?


My bad, I made a typos here. It should be all 8:0 or 8:16. 
What I want to do here was to set an easy to reach value to BPS_LIMIT (10M/s in this example) and an unable to reach value to IOPS_LIMIT (100000 in this example).


Under this setting, the iostat shows that the bps is far less than 10M/s and sometimes is far larger than 10M/s.


Once I cancel the IOPS_LIMIT, i.e., echo 0 to the write_iops_device, the bps stabilizes at around 10M/s.


The root cause of this is that the split bio will be throttled again even though a tg is under the IOPS_LIMIT when the tg's sq->queue is not empty.


Let me explain it with the code.


If we only set BPS_LIMIT and the bio is flagged with BIO_BPS_THROTTLED, the blk_should_throtl() will always return false and the __blk_throtl_bio() will not be called.  So, the bio flagged with BIO_BPS_THROTTLED will not be throttled in any cases.


However, if we set both BPS_LIMIT and IOPS_LIMIT, the blk_should_throtl() will always return true no matter what the value the IOPS_LIMIT is because tg->has_rules_iops[rw] is always true.


After that, the bio will be passed to __blk_throtl_bio().


If the tg's sq->queue is empty, blkthrottle will calculate if the tg is above limit with the bio. 
Since the bio is flagged with BIO_BPS_THROTTLED, blkthrottle will only calculate the IOPS limit for the tg.
If tg is under iops limit with the bio, the bio will not be throttled. 
Otherwise, the bio will be throttled because of the iops limit.


However, If the tg's sq->queue is not empty, the bio will be throttled directly without any iops or bps limitation calculations.


The  related code snippet is :

bool __blk_throtl_bio(struct bio *bio) 
{
.......

while (true) {
......
    /* throtl is FIFO - if bios are already queued, should queue */
    if (sq->nr_queued[rw])
        break;  // ----->  the bio will be throttled in any cases

    /* if above limits, break to queue */
    if (!tg_may_dispatch(tg, bio, NULL)) { // ----> do iops and bps calculations
        break;  // --->  the bio will be throttled if the tg is above bps or iops limits

......
    
    if (!tg) { 
        bio_set_flag(bio, BIO_BPS_THROTTLED);
        goto out_unlock; // ----> pass, no throttle
    }

.......
}
}


So even the bio is flagged with BIO_BPS_THROTTLED and the tg is far more behind IOPS_LIMIT, the bio will be throttled if the sq->queue is not empty.


This problem can be reproduced by running following scripts and comparing the outputs of iostat.
1) 
PNUM=50 # large enough to saturate the sq->queue

cgpath=/sys/fs/cgroup/blkio/test0
mkdir -p $cgpath
echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device
echo "8:0 100000" > $cgpath/blkio.throttle.write_iops_device  # large enough to make it unreachable
for ((i=0;i<;$PUM; i++));do
  fio -rw=write -direct=1 -bs=4M -iodepth=8 -size=200M -numjobs=1 \
-time_based=1 -runtime=30  -name=testt_$i -filename=testf_$i > /dev/null &
  echo $! > $cgpath/tasks
done

2)

PNUM=50 # large enough to saturate the sq->queue

cgpath=/sys/fs/cgroup/blkio/test0
mkdir -p $cgpath
echo "8:0 10485760" > $cgpath/blkio.throttle.write_bps_device
echo "8:0 0" > $cgpath/blkio.throttle.write_iops_device  # do not set iops limit
for ((i=0;i<;$PUM; i++));do
  fio -rw=write -direct=1 -bs=4M -iodepth=8 -size=200M -numjobs=1 \
-time_based=1 -runtime=30  -name=testt_$i -filename=testf_$i > /dev/null &
  echo $! > $cgpath/tasks
done

>> Signed-off-by: zhoutaiyu <zhoutaiyu@...ishou.com>
>> ---
>>   block/blk-throttle.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index f4850a6..499c006 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -913,7 +913,8 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
>>        * queued.
>>        */
>>       BUG_ON(tg->service_queue.nr_queued[rw] &&
>> -            bio != throtl_peek_queued(&tg->service_queue.queued[rw]));
>> +            bio != throtl_peek_queued(&tg->service_queue.queued[rw]) &&
>> +            !bio_flagged(bio, BIO_BPS_THROTTLED));
>>
>>       /* If tg->bps = -1, then BW is unlimited */
>>       if ((bps_limit == U64_MAX && iops_limit == UINT_MAX) ||
>> @@ -2201,7 +2202,7 @@ bool __blk_throtl_bio(struct bio *bio)
>>               throtl_downgrade_check(tg);
>>               throtl_upgrade_check(tg);
>>               /* throtl is FIFO - if bios are already queued, should queue */
>> -             if (sq->nr_queued[rw])
>> +             if (sq->nr_queued[rw] && !bio_flagged(bio, BIO_BPS_THROTTLED))

> No, this change is wrong. Split IO will not be throttled by iops limit
anymore.

After this change, the split IO will be throttled by iops limit again if it reaches a tg's iops limit and will not be throttled in any cases if the sq->queue is not empty.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ