[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFpoUr2PmOzOfE4+zBP5HGzEypj-7BhStjUoCVChPt-yT_s2EA@mail.gmail.com>
Date: Tue, 27 Apr 2021 13:24:00 +0200
From: Odin Ugedal <odin@...dal.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Odin Ugedal <odin@...d.al>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/1] sched/fair: Fix unfairness caused by missing load decay
Hi,
> I wanted to say one v5.12-rcX version to make sure this is still a
> valid problem on latest version
Ahh, I see. No problem. :) Thank you so much for taking the time to
look at this!
> I confirm that I can see a ratio of 4ms vs 204ms running time with the
> patch below.
(I assume you talk about the bash code for reproducing, not the actual
sched patch.)
> But when I look more deeply in my trace (I have
> instrumented the code), it seems that the 2 stress-ng don't belong to
> the same cgroup but remained in cg-1 and cg-2 which explains such
> running time difference.
(mail reply number two to your previous mail might also help surface it)
Not sure if I have stated it correctly, or if we are talking about the
same thing. It _is_ the intention that the two procs should not be in the
same cgroup. In the same way as people create "containers", each proc runs
in a separate cgroup in the example. The issue is not the balancing
between the procs
themselves, but rather cgroups/sched_entities inside the cgroup hierarchy.
(due to the fact that the vruntime of those sched_entities end up
being calculated with more load than they are supposed to).
If you have any thought about the phrasing of the patch itself to make it
easier to understand, feel free to suggest.
Given the last cgroup v1 script, I get this:
- cat /proc/<stress-pid-1>/cgroup | grep cpu
11:cpu,cpuacct:/slice/cg-1/sub
3:cpuset:/slice
- cat /proc/<stress-pid-2>/cgroup | grep cpu
11:cpu,cpuacct:/slice/cg-2/sub
3:cpuset:/slice
The cgroup hierarchy will then roughly be like this (using cgroup v2 terms,
becuase I find them easier to reason about):
slice/
cg-1/
cpu.shares: 100
sub/
cpu.weight: 1
cpuset.cpus: 1
cgroup.procs - stress process 1 here
cg-2/
cpu.weight: 100
sub/
cpu.weight: 10000
cpuset.cpus: 1
cgroup.procs - stress process 2 here
This should result in 50/50 due to the fact that cg-1 and cg-2 both have a
weight of 100, and "live" inside the /slice cgroup. The inner weight should not
matter, since there is only one cgroup at that level.
> So your script doesn't reproduce the bug you
> want to highlight. That being said, I can also see a diff between the
> contrib of the cpu0 in the tg_load. I'm going to look further
There can definitely be some other issues involved, and I am pretty sure
you have way more knowledge about the scheduler than me... :) However,
I am pretty sure that it is in fact showing the issue I am talking about,
and applying the patch does indeed make it impossible to reproduce it
on my systems.
Odin
Powered by blists - more mailing lists