linux-kernel - Re: [PATCH] blk-throtl: optimize IOPS throttle for large IO scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9d8b584a-738b-a0a8-ea8c-e617c2f79408@gmail.com>
Date:   Sat, 17 Jul 2021 07:07:25 +0800
From:   brookxu <brookxu.cn@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     axboe@...nel.dk, cgroups@...r.kernel.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] blk-throtl: optimize IOPS throttle for large IO scenarios



Tejun Heo wrote on 2021/7/17 0:09:
> Hello,
> 
> On Fri, Jul 16, 2021 at 02:22:49PM +0800, brookxu wrote:
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index a11b3b5..86ff943 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -348,6 +348,8 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
>>  		trace_block_split(split, (*bio)->bi_iter.bi_sector);
>>  		submit_bio_noacct(*bio);
>>  		*bio = split;
>> +
>> +		blk_throtl_recharge_bio(*bio);
> 
> I don't think we're holding the queue lock here.

sorry, some kind of synchronization mechanism is really needed here. But the use of queue_lock
here may be unsafe, since it is difficult for us to control the lock on the split path.

>>  	}
>>  }
>>  
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index b1b22d8..1967438 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -2176,6 +2176,40 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
>>  }
>>  #endif
>>  
>> +void blk_throtl_recharge_bio(struct bio *bio)
>> +{
>> +	bool rw = bio_data_dir(bio);
>> +	struct blkcg_gq *blkg = bio->bi_blkg;
>> +	struct throtl_grp *tg = blkg_to_tg(blkg);
>> +	u32 iops_limit = tg_iops_limit(tg, rw);
>> +
>> +	if (iops_limit == UINT_MAX)
>> +		return;
>> +
>> +	/*
>> +	 * If previous slice expired, start a new one otherwise renew/extend
>> +	 * existing slice to make sure it is at least throtl_slice interval
>> +	 * long since now. New slice is started only for empty throttle group.
>> +	 * If there is queued bio, that means there should be an active
>> +	 * slice and it should be extended instead.
>> +	 */
>> +	if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw]))
>> +		throtl_start_new_slice(tg, rw);
>> +	else {
>> +		if (time_before(tg->slice_end[rw],
>> +		    jiffies + tg->td->throtl_slice))
>> +			throtl_extend_slice(tg, rw,
>> +				jiffies + tg->td->throtl_slice);
>> +	}
>> +
>> +	/* Recharge the bio to the group, as some BIOs will be further split
>> +	 * after passing through the throttle, causing the actual IOPS to
>> +	 * be greater than the expected value.
>> +	 */
>> +	tg->last_io_disp[rw]++;
>> +	tg->io_disp[rw]++;
>> +}
> 
> But blk-throtl expects queue lock to be held.
> 
> How about doing something simpler? Just estimate how many bios a given bio
> is gonna be and charge it outright? The calculation will be duplicated
> between the split path but that seems like the path of least resistance
> here.

I have tried this method, the code redundancy is indeed a bit high, it may not be
very convenient for code maintenance. In addition to this problem, since we add
a large value at a time, the fluctuation of IOPS will be relatively large. Since
blk_throtl_recharge_bio() does not need to participate in the maintenance of the
state machine, we only need to protect some fields of tg, so can we add a new
spin_lock to tg instead of queue_lock to solve the synchronization problem ? Just
a idea, Thanks.

> Thanks.
>