[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <53303B01.6070302@arm.com>
Date: Mon, 24 Mar 2014 15:02:41 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: "peterz@...radead.org" <peterz@...radead.org>,
"mingo@...nel.org" <mingo@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"fenghua.yu@...el.com" <fenghua.yu@...el.com>,
"schwidefsky@...ibm.com" <schwidefsky@...ibm.com>,
"james.hogan@...tec.com" <james.hogan@...tec.com>,
"cmetcalf@...era.com" <cmetcalf@...era.com>,
"benh@...nel.crashing.org" <benh@...nel.crashing.org>,
"linux@....linux.org.uk" <linux@....linux.org.uk>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"preeti@...ux.vnet.ibm.com" <preeti@...ux.vnet.ibm.com>,
"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>
Subject: Re: [PATCH v3 1/6] sched: rework of sched_domain topology definition
On 21/03/14 11:04, Vincent Guittot wrote:
> On 20 March 2014 18:18, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>> On 20/03/14 17:02, Vincent Guittot wrote:
>>> On 20 March 2014 13:41, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>>> On 19/03/14 16:22, Vincent Guittot wrote:
>>>>> We replace the old way to configure the scheduler topology with a new method
>>>>> which enables a platform to declare additionnal level (if needed).
>>>>>
>>>>> We still have a default topology table definition that can be used by platform
>>>>> that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
>>>>> be overwritten by an arch which either wants to add new level where a load balance
>>>>> make sense like BOOK or powergating level or wants to change the flags
>>>>> configuration of some levels.
>>>>>
>>>>> For each level, we need a function pointer that returns cpumask for each cpu,
>>>>> a function pointer that returns the flags for the level and a name. Only flags
>>>>> that describe topology, can be set by an architecture. The current topology
>>>>> flags are:
>>>>> SD_SHARE_CPUPOWER
>>>>> SD_SHARE_PKG_RESOURCES
>>>>> SD_NUMA
>>>>> SD_ASYM_PACKING
>>>>>
>>>>> Then, each level must be a subset on the next one. The build sequence of the
>>>>> sched_domain will take care of removing useless levels like those with 1 CPU
>>>>> and those with the same CPU span and relevant information for load balancing
>>>>> than its child.
>>>>
>>>> The paragraph above contains important information to set this up
>>>> correctly, that's why it might be worth clarifying:
>>>>
>>>> - "next one" of sd means "child of sd" ?
>>>
>>> It's the next one in the table so the parent in the sched_domain
>>
>> Right, it's this way around. DIE is parent of MC is parent of GMC. Maybe
>> you could be more explicit about the parent of relation here?
>>
>>>
>>>> - "subset" means really "subset" and not "proper subset" ?
>>>
>>> yes, it's really "subset" and not "proper subset"
>>>
>>> Vincent
>>>
>>>>
>>>> On TC2 w/ the following change in cpu_corepower_mask()
>>>>
>>>> const struct cpumask *cpu_corepower_mask(int cpu)
>>>> {
>>>> - return &cpu_topology[cpu].thread_sibling;
>>>> + return cpu_topology[cpu].socket_id ?
>>>> &cpu_topology[cpu].thread_sibling :
>>>> + &cpu_topology[cpu].core_sibling;
>>>> }
>>>>
>>>> I get this e.g. for CPU0,2:
>>>>
>>>> CPU0: cpu_corepower_mask=0-1 -> GMC is subset of MC
>>>> CPU0: cpu_coregroup_mask=0-1
>>>> CPU0: cpu_cpu_mask=0-4
>>>>
>>>> CPU2: cpu_corepower_mask=2 -> GMC is proper sunset of MC
>>>> CPU2: cpu_coregroup_mask=2-4
>>>> CPU2: cpu_cpu_mask=0-4
>>>>
>>>> I assume here that this is a correct set-up.
>>
>> So this is a correct setup?
>
> yes it's a correct setup before the degenerate sequence
Cool, thanks.
>
>>
>>>>
>>>> The domain degenerate part:
>>>>
>>>> "useless levels like those with 1 CPU" ... that's the case for GMC level
>>>> for CPU2,3,4
>>>>
>>>> The GMC level is destroyed because of the following code snippet in
>>>> sd_degenerate(): if (cpumask_weight(sched_domain_span(sd)) == 1)
>>>>
>>>> so that's fine.
>>>>
>>>> In case of CPU0,1 since GMC and MC have the same span, the code in
>>>> build_sched_groups() creates only one group for MC and that's why
>>>> pflags is altered in sd_parent_degenerate() to SD_WAKE_AFFINE (0x20) and
>>>> the if condition 'if (~cflags & pflags)' is not hit and
>>>> sd_parent_degenerate() finally returns 1 for MC.
>>>>
>>>> So the "those with the same CPU span and relevant information for load
>>>> balancing than its child." is not so easy to understand for me. Because
>>>> both levels have the same span we actually don't take the flags of the
>>>> parent into consideration which require at least 2 groups.
>
> It's only the case if the parent has got 1 group otherwise we take
> care of all flags
Agreed & understood.
>
>>>>
>>>> So the TC2 example covers for me two corner cases: (1) The level I want
>>>> to get rid of only contains 1 CPU (GMC for CPU2,3,4) and (2) The span of
>>>> the parent level I want to get rid of (MC for CPU0,1) of is the same as
>>>> the span of the level which should stay.
>
> Having the same span is not enough. There must also no have relevant
> differences in the flags (after removing flags that need more than 1
> group is the parent has only 1 groups)
But if the span is the same (e.g. GMC, MC in the TC2 example), then
build_sched_groups() will always only create 1 group for the appropriate
parent (e.g. MC) following to the degenerate related code path I
described above. The TC2 example simply doesn't cover the case where the
parent is destroyed because of relevant differences in the flags. Also,
the added SD_SHARE_POWERDOMAIN in sd_parent_degenerate() of patch
'sched: add a new SD_SHARE_POWERDOMAIN for sched_domain' doesn't make a
differences because it's not set on MC level in the TC2 example. All I
want to say is that this code is not completely tested w/ this TC2
set-up alone.
>
>>>>
>>>> Are these two corner cases the only one supported here? If yes this has
>>>> to be stated somewhere, otherwise if somebody will try this approach on
>>>> a different topology, (s)he might be surprised.
>
> The degenerate sequence is there to remove useless level but it will
> not remove useful level. This rework has not modify the behavior of
> the degenerate sequence so (s)he should take the same care than
> previously.
Probably nitpicking here, but the patch 'sched: add a new
SD_SHARE_POWERDOMAIN for sched_domain' adds SD_SHARE_POWERDOMAIN in
sd_degenerate() and sd_parent_degenerate() does by introducing this flag.
-- Dietmar
>
> Vincent
>
>>
>> Could you please comment on the paragraph above too?
>>
>> Thanks,
>>
>> -- Dietmar
>>
>>>>
>>>> If we only consider SD_SHARE_POWERDOMAIN for the socket related level,
>>>> this works fine.
>>>>
>>>> I would like to test this on more platforms but I only have my TC2
>>>> available :-)
>>>>
>>>> -- Dietmar
>>>>
>>>> [...]
>>>>
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists