[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDHcGrgSKxPvWzX5yia=_pct29ybKmUv=OEgCagZTpASA@mail.gmail.com>
Date: Tue, 18 Dec 2012 10:53:31 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Alex Shi <alex.shi@...el.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linaro-dev@...ts.linaro.org, peterz@...radead.org,
mingo@...nel.org, linux@....linux.org.uk, pjt@...gle.com,
santosh.shilimkar@...com, Morten.Rasmussen@....com,
chander.kashyap@...aro.org, cmetcalf@...era.com,
tony.luck@...el.com, preeti@...ux.vnet.ibm.com,
paulmck@...ux.vnet.ibm.com, tglx@...utronix.de,
len.brown@...el.com, arjan@...ux.intel.com,
amit.kucheria@...aro.org, viresh.kumar@...aro.org
Subject: Re: [RFC PATCH v2 3/6] sched: pack small tasks
On 17 December 2012 16:24, Alex Shi <alex.shi@...el.com> wrote:
>>>>>>> The scheme below tries to summaries the idea:
>>>>>>>
>>>>>>> Socket | socket 0 | socket 1 | socket 2 | socket 3 |
>>>>>>> LCPU | 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
>>>>>>> buddy conf0 | 0 | 0 | 1 | 16 | 2 | 32 | 3 | 48 |
>>>>>>> buddy conf1 | 0 | 0 | 0 | 16 | 16 | 32 | 32 | 48 |
>>>>>>> buddy conf2 | 0 | 0 | 16 | 16 | 32 | 32 | 48 | 48 |
>>>>>>>
>>>>>>> But, I don't know how this can interact with NUMA load balance and the
>>>>>>> better might be to use conf3.
>>>>>>
>>>>>> I mean conf2 not conf3
>>>>>
>>>>> So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
>>>>> is unbalanced for different socket.
>>>>
>>>> That the target because we have decided to pack the small tasks in
>>>> socket 0 when we have parsed the topology at boot.
>>>> We don't have to loop into sched_domain or sched_group anymore to find
>>>> the best LCPU when a small tasks wake up.
>>>
>>> iteration on domain and group is a advantage feature for power efficient
>>> requirement, not shortage. If some CPU are already idle before forking,
>>> let another waking CPU check their load/util and then decide which one
>>> is best CPU can reduce late migrations, that save both the performance
>>> and power.
>>
>> In fact, we have already done this job once at boot and we consider
>> that moving small tasks in the buddy CPU is always benefit so we don't
>> need to waste time looping sched_domain and sched_group to compute
>> current capacity of each LCPU for each wake up of each small tasks. We
>> want all small tasks and background activity waking up on the same
>> buddy CPU and let the default behavior of the scheduler choosing the
>> best CPU for heavy tasks or loaded CPUs.
>
> IMHO, the design should be very good for your scenario and your machine,
> but when the code move to general scheduler, we do want it can handle
> more general scenarios. like sometime the 'small task' is not as small
> as tasks in cyclictest which even hardly can run longer than migration
Cyclictest is the ultimate small tasks use case which points out all
weaknesses of a scheduler for such kind of tasks.
Music playback is a more realistic one and it also shows improvement
> granularity or one tick, thus we really don't need to consider task
> migration cost. But when the task are not too small, migration is more
For which kind of machine are you stating that hypothesis ?
> heavier than domain/group walking, that is the common sense in
> fork/exec/waking balance.
I would have said the opposite: The current scheduler limits its
computation of statistic during fork/exec/waking compared to a
periodic load balance because it's too heavy. It's even more true for
wake up if wake affine is possible.
>
>>
>>>
>>> On the contrary, move task walking on each level buddies is not only bad
>>> on performance but also bad on power. Consider the quite big latency of
>>> waking a deep idle CPU. we lose too much..
>>
>> My result have shown different conclusion.
>
> That should be due to your tasks are too small to need consider
> migration cost.
>> In fact, there is much more chance that the buddy will not be in a
>> deep idle as all the small tasks and background activity are already
>> waking on this CPU.
>
> powertop is helpful to tune your system for more idle time. Another
> reason is current kernel just try to spread tasks on more cpu for
> performance consideration. My power scheduling patch should helpful on this.
>>
>>>
>>>>
>>>>>
>>>>> And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
>>>>> not a good design, consider my previous examples: if there are 4 or 8
>>>>> tasks in one socket, you just has 2 choices: spread them into all cores,
>>>>> or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
>>>>> maybe a better solution. but the design missed this.
>>>>
>>>> You speak about tasks without any notion of load. This patch only care
>>>> of small tasks and light LCPU load, but it falls back to default
>>>> behavior for other situation. So if there are 4 or 8 small tasks, they
>>>> will migrate to the socket 0 after 1 or up to 3 migration (it depends
>>>> of the conf and the LCPU they come from).
>>>
>>> According to your patch, what your mean 'notion of load' is the
>>> utilization of cpu, not the load weight of tasks, right?
>>
>> Yes but not only. The number of tasks that run simultaneously, is
>> another important input
>>
>>>
>>> Yes, I just talked about tasks numbers, but it naturally extends to the
>>> task utilization on cpu. like 8 tasks with 25% util, that just can full
>>> fill 2 CPUs. but clearly beyond the capacity of the buddy, so you need
>>> to wake up another CPU socket while local socket has some LCPU idle...
>>
>> 8 tasks with a running period of 25ms per 100ms that wake up
>> simultaneously should probably run on 8 different LCPU in order to
>> race to idle
>
> nope, it's a rare probability of 8 tasks wakeuping simultaneously. And
Multimedia is one example of tasks waking up simultaneously
> even so they should run in the same socket for power saving
> consideration(my power scheduling patch can do this), instead of spread
> to all sockets.
This is may be good for your scenario and your machine :-)
Packing small tasks is the best choice for any scenario and machine.
It's a more tricky point for not so small tasks because different
machine will want different behavior.
>>
>>
>> Regards,
>> Vincent
>>
>>>>
>>>> Then, if too much small tasks wake up simultaneously on the same LCPU,
>>>> the default load balance will spread them in the core/cluster/socket
>>>>
>>>>>
>>>>> Obviously, more and more cores is the trend on any kinds of CPU, the
>>>>> buddy system seems hard to catch up this.
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Thanks
>>> Alex
>
>
> --
> Thanks
> Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists