[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200305123351.GB32088@vingu-book>
Date: Thu, 5 Mar 2020 13:33:51 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Christian Borntraeger <borntraeger@...ibm.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: 5.6-rc3: WARNING: CPU: 48 PID: 17435 at kernel/sched/fair.c:380
enqueue_task_fair+0x328/0x440
Le jeudi 05 mars 2020 à 13:12:39 (+0100), Dietmar Eggemann a écrit :
> On 05/03/2020 12:28, Christian Borntraeger wrote:
> >
> > On 05.03.20 10:30, Vincent Guittot wrote:
> >> Le mercredi 04 mars 2020 à 20:59:33 (+0100), Christian Borntraeger a écrit :
> >>>
> >>> On 04.03.20 20:38, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 04.03.20 20:19, Dietmar Eggemann wrote:
>
> [...]
>
> > It seems to speed up the issue when I do a compile job in parallel on the host:
> >
> > Do you also need the sysfs tree?
>
> [ 87.932552] CPU23 path=/machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10. on_list=1 nr_running=1 throttled=0 p=[CPU 2/KVM 2662]
> [ 87.932559] CPU23 path=/machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10. on_list=0 nr_running=3 throttled=0 p=[CPU 2/KVM 2662]
> [ 87.932562] CPU23 path=/machine.slice/machine-test.slice on_list=1 nr_running=1 throttled=1 p=[CPU 2/KVM 2662]
> [ 87.932564] CPU23 path=/machine.slice on_list=1 nr_running=0 throttled=0 p=[CPU 2/KVM 2662]
> [ 87.932566] CPU23 path=/ on_list=1 nr_running=1 throttled=0 p=[CPU 2/KVM 2662]
> [ 87.951872] CPU23 path=/ on_list=1 nr_running=2 throttled=0 p=[ksoftirqd/23 126]
> [ 87.987528] CPU23 path=/user.slice on_list=1 nr_running=2 throttled=0 p=[as 6737]
> [ 87.987533] CPU23 path=/ on_list=1 nr_running=1 throttled=0 p=[as 6737]
>
> Arrh, looks like 'char path[64]' is too small to hold 'machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10.scope/vcpuX' !
> ^
> But I guess that the 'on_list=0' for 'machine-qemu\x2d18\x2dtest10.scope' could be the missing hint?
yes the if (cfs_bandwidth_used()) at the end of enqueue_task_fair is not enough
to ensure that all cfs will be added back. It will "work" for the 1st enqueue
because the throttled cfs will be added and will reset tmp_alone_branch but not
for the next one
Compare to the previous proposed fix, we can optimize it a bit with:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9ccde775e02e..3b19e508641d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4035,10 +4035,16 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
__enqueue_entity(cfs_rq, se);
se->on_rq = 1;
- if (cfs_rq->nr_running == 1) {
+ /*
+ * When bandwidth control is enabled, cfs might have been removed because of
+ * a parent been throttled but cfs->nr_running > 1. Try to add it
+ * unconditionnally.
+ */
+ if (cfs_rq->nr_running == 1 || cfs_bandwidth_used())
list_add_leaf_cfs_rq(cfs_rq);
+
+ if (cfs_rq->nr_running == 1)
check_enqueue_throttle(cfs_rq);
- }
}
static void __clear_buddies_last(struct sched_entity *se)
Powered by blists - more mailing lists