[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTinLe8bMUCTp8Kmv98=s8D8-1qUo78Vff_-Vpgfv@mail.gmail.com>
Date: Mon, 7 Mar 2011 15:00:24 +0800
From: Yong Zhang <yong.zhang0@...il.com>
To: balbir@...ux.vnet.ibm.com
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <pzijlstr@...hat.com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Bharata B Rao <bharata.rao@...ibm.com>
Subject: Re: [BUGFIX][PATCH] Fix sched rt group scheduling when hierachy is enabled
On Fri, Mar 4, 2011 at 8:11 PM, Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> I based the changes on what I saw during my debugging/test. I
> explained it earlier,
>
> Everyone is dequeued
>
> 1. child runs first, finds parent throttled, so it does not queue
> anything on parent group. child is unthrottled and rt_time now becomes
> 0, parent's rt_nr_running is not incremented.
> 2. Parent timer runs, it is unthrottled, its group->rt_nr_running is 0
> hence enqueue is not called
I have tested with the attached(web mail will mangle it) patch with
yours applied. But I failed to trigger that WARNING.
Below is my steps:
1)mount -t cgroup -ocpu cpu /mnt
2)mkdir /mnt/test-1
3)mkdir /mnt/test-1-1
4)set rt_runtime to 100000 for test-1 and test-1-1
5)run a loop task and attach it to test-1-1
So I thought out a scenario to satisfy your description,
but it's based on the unpatched(without your patch) kernel:
Let's assume a dual-core system with test-1/test-1-1
for rt group, a loop task is running on CPU 1 and test-1
and test-1-1 are both throttled.
CPU-0 CPU-1
do_sched_rt_period_timer(test-1-1)
{
for CPU-1
unthrottled test-1-1.rt_rq[1];
but fail to enqueue it because
we alway get test-1-1.rt_se[0]
due to smp_processor_id();
thus test-1.rt_rq[1].nr_running == 0;
and it returned with run_time == 0;
}
do_sched_rt_period_timer(test-1)
unthrottle test-1.rt_rt[1] but
fail to enqueue test-1.rt_rt[1];
because nr_running == 0;
So if we have your patch for issue-1, when
the hrtimer is running on CPU-1, test-1-1
and test-1 will be queued because that
additional check in run_timer == 0 case.
But once we have your patch for issue-2, the above
problem will be killed by it. right?
Correct me if I'm wrong :)
Thanks,
Yong
--
Only stand for myself
View attachment "0001.patch" of type "text/x-patch" (532 bytes)
Powered by blists - more mailing lists