linux-kernel - Re: Re: [PATCH] blkcg: Before starting a new slice, firstly count bps/iops limit in func tg_may

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <201301141024104635523@gmail.com>
Date:	Mon, 14 Jan 2013 10:25:00 +0800
From:	majianpeng <majianpeng@...il.com>
To:	"Vivek Goyal" <vgoyal@...hat.com>, axboe <axboe@...nel.dk>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>, tj <tj@...nel.org>
Subject: Re: Re: [PATCH] blkcg: Before starting a new slice, firstly count bps/iops limit in func tg_may_dispatch.

[maybe you received more, I'm very sorry for that]
>On Fri, Jan 11, 2013 at 03:26:37PM +0100, Jens Axboe wrote:
>> On 2013-01-11 10:11, majianpeng wrote:
>> > In func tg_may_dispatch,
>> >> if (throtl_slice_used(td, tg, rw))
>> >> 		throtl_start_new_slice(td, tg, rw);
>> > ...
>> >> if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> >> 	    && tg_with_in_iops_limit(td, tg, bio, &iops_wait)) {
>> > 
>> > In funcs tg_with_in_(bps/iops)_limit, it used the slice_start to count.
>> > So if privious slice expired, it start a new slice.This can cause hung
>> > task.
>> > 
>> > The next steps can repeat this bug.
>> > 1:echo "8:48 10240" > blkio.throttle.write_bps_devic
>> > 2:dd if=/dev/zero of=/dev/sdd bs=1M count=1 oflag=direct
>> > 
>> > Using the blktrace, the messages about throttle:
>> > root@...nel:/mnt/programs# blktrace -d /dev/sdd -a notify -o -|blkparse -i -
>> >   8,48   1        0     0.000000000     0  m   N throtl / [W] new slice start=4294854679 end=4294854779 jiffies=4294854679
>> >   8,48   1        0     0.000000966     0  m   N throtl / [W] extend slice start=4294854679 end=4294905900 jiffies=4294854679
>> >   8,48   1        0     0.000002553     0  m   N throtl / [W] bio. bdisp=0 sz=524288 bps=10240 iodisp=0 iops=4294967295 queued=0/0
>> >   8,48   1        0     0.000004788     0  m   N throtl schedule work. delay=51200 jiffies=4294854679
>> >   8,48   1        0    51.304698681     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>> >   8,48   1        0    51.304701979     0  m   N throtl / [W] new slice start=4294905984 end=4294906084 jiffies=4294905984
>> >   8,48   1        0    51.304703329     0  m   N throtl / [W] extend slice start=4294905984 end=4294957200 jiffies=4294905984
>> >   8,48   1        0    51.304705783     0  m   N throtl schedule work. delay=51200 jiffies=4294905984
>> >   8,48   1        0   102.632697082     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>> >   8,48   1        0   102.632700544     0  m   N throtl / [W] new slice start=4294957312 end=4294957412 jiffies=4294957312
>> >   8,48   1        0   102.632701922     0  m   N throtl / [W] extend slice start=4294957312 end=4295008600 jiffies=4294957312
>> >   8,48   1        0   102.632704016     0  m   N throtl schedule work. delay=51200 jiffies=4294957312
>> >   8,48   1        0   153.960696503     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>> >   8,48   1        0   153.960699797     0  m   N throtl / [W] new slice start=4295008640 end=4295008740 jiffies=4295008640
>> >   8,48   1        0   153.960701153     0  m   N throtl / [W] extend slice start=4295008640 end=4295059900 jiffies=4295008640
>> >   8,48   1        0   153.960703218     0  m   N throtl schedule work. delay=51200 jiffies=4295008640
>> >   8,48   1        0   205.288697067     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>> >   8,48   1        0   205.288700268     0  m   N throtl / [W] new slice start=4295059968 end=4295060068 jiffies=4295059968
>> >   8,48   1        0   205.288701630     0  m   N throtl / [W] extend slice start=4295059968 end=4295111200 jiffies=4295059968
>> >   8,48   1        0   205.288703784     0  m   N throtl schedule work. delay=51200 jiffies=4295059968
>> >   8,48   1        0   256.616696184     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>> >   8,48   1        0   256.616699266     0  m   N throtl / [W] new slice start=4295111296 end=4295111396 jiffies=4295111296
>> >   8,48   1        0   256.616700574     0  m   N throtl / [W] extend slice start=4295111296 end=4295162500 jiffies=4295111296
>> >   8,48   1        0   256.616702701     0  m   N throtl schedule work. delay=51200 jiffies=4295111296
>> > 
>> > Signed-off-by: Jianpeng Ma <majianpeng@...il.com>
>> > ---
>> >  block/blk-throttle.c |   15 ++++++++++-----
>> >  1 file changed, 10 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> > index 3114622..9258789 100644
>> > --- a/block/blk-throttle.c
>> > +++ b/block/blk-throttle.c
>> > @@ -645,6 +645,15 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
>> >  	}
>> >  
>> >  	/*
>> > +	 * If privious slice expired,then start new slice.
>> > +	 * But counting bps and iops limit need privious slice info
>> > +	 * which ->slice_start.
>> > +	 */
>> > +	if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> > +	    && tg_with_in_iops_limit(td, tg, bio, &iops_wait))
>> > +		if (wait)
>> > +			*wait = 0;
>> > +	/*
>> >  	 * If previous slice expired, start a new one otherwise renew/extend
>> >  	 * existing slice to make sure it is at least throtl_slice interval
>> >  	 * long since now.
>> > @@ -656,12 +665,8 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg,
>> >  			throtl_extend_slice(td, tg, rw, jiffies + throtl_slice);
>> >  	}
>> >  
>> > -	if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> > -	    && tg_with_in_iops_limit(td, tg, bio, &iops_wait)) {
>> > -		if (wait)
>> > -			*wait = 0;
>> > +	if (!(bps_wait || iops_wait))
>> >  		return 1;
>> > -	}
>> >  
>> >  	max_wait = max(bps_wait, iops_wait);
>> 
>> Looks pretty sane to me. Vivek?
>
>Hi Jens,
>
>This fix will introduce other side affects. And that is when an group
>has been idle for few seconds and a new IO gets queued in, we will
>not start a new slice and allow dispatch equivalent of those idle
>seconds before we throttle the group. So keeping group idle will
>become an incentive.
>
>I have attached a patch which might work better. majianpeng, can
>you please give it a try.
>
>Having said that it is strange that workqueue thread is triggering
>so late. Looking at timestamp of traces attached.
>
>We scheduled a delayed work. Following is trace.
>
>8,48   1        0    51.304705783     0  m   N throtl schedule work. delay=51200 jiffies=4294905984
>
>Now a worker should execute after 51.2 seconds (delay=51200).
>
>8,48   1        0   102.632697082     0  m   N throtl dispatch nr_queued=1 read=0 write=1
>
>But worker executed at delay of 51.328 seconds (102.632 - 51.304). That
>is a delay of around 128 jiffies (51.328 - 51.2). That kind of seems odd.
>AFAIK, workers are fairly quick to execute after expiry. And it is 
>because of this excessive delay that we think slice has expired. 
>
>May be it is something new. CCing Tejun, if he has seen anything like this
>and may be it is a known issue.
>
>Even if it is a worker thread issue, I think applying below patch makes
>sense.
>
>Thanks
>Vivek
>
>
>blk-throttle: Do not start a new slice if IO is pending
>
>It might happen that we queue an IO (because it overshot the rate)
>and then wait for certain jiffies to pass. We will set slice_end
>accordingly and wait for that time to elapse and worker thread will
>kick in, again evaluate whether group can dispatch or not. Now it
>might happen that current time is after slice_end and
>tg_may_dispatch() will start a new slice.
>
>This will result in IO not being dispatched and be put back on
>wait again. And this will repeat, resulting in hang.
>
>Do not start a new slice if an IO is pending in that group in
>the direction being queired. Instead extend the slice. New slice
>is supposed to start when a group has not been doing IO for some
>time and a new IO shows up. In that case we want do discard
>history and start a new slice.
>
>Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
>---
> block/blk-throttle.c |    8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
>Index: linux-2.6/block/blk-throttle.c
>===================================================================
>--- linux-2.6.orig/block/blk-throttle.c	2012-10-18 01:52:28.000000000 -0400
>+++ linux-2.6/block/blk-throttle.c	2013-01-14 03:40:41.355731375 -0500
>@@ -648,8 +648,14 @@ static bool tg_may_dispatch(struct throt
> 	 * If previous slice expired, start a new one otherwise renew/extend
> 	 * existing slice to make sure it is at least throtl_slice interval
> 	 * long since now.
>+	 *
>+	 * Start a new slice only if there is no bio queued in that direction.
>+	 * That bio is waiting to be dispatched and slice needs to be
>+	 * extended. It might happen that bio waited to be dispatched but
>+	 * workqueue execution got little late it might restart a new slice
>+	 * instead of taking all the waited time into account.
> 	 */
>-	if (throtl_slice_used(td, tg, rw))
>+	if (throtl_slice_used(td, tg, rw) && !tg->nr_queued[rw])
> 		throtl_start_new_slice(td, tg, rw);
> 	else {
> 		if (time_before(tg->slice_end[rw], jiffies + throtl_slice))
Hi vivek,
	Your patch is ok.But i had a question:
What's condition tg->nr_queued[rw] = 0, but bio is not null?

Thanks!
Jianpeng