[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a68891ae-468b-da35-61ee-a3136c6e64c1@oracle.com>
Date: Wed, 19 Sep 2018 14:53:45 -0700
From: Subhra Mazumdar <subhra.mazumdar@...cle.com>
To: "Jan H. Schönherr" <jschoenh@...zon.de>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [RFC 00/60] Coscheduling for Linux
On 09/18/2018 04:44 AM, Jan H. Schönherr wrote:
> On 09/18/2018 02:33 AM, Subhra Mazumdar wrote:
>> On 09/07/2018 02:39 PM, Jan H. Schönherr wrote:
>>> A) Quickstart guide for the impatient.
>>> --------------------------------------
>>>
>>> Here is a quickstart guide to set up coscheduling at core-level for
>>> selected tasks on an SMT-capable system:
>>>
>>> 1. Apply the patch series to v4.19-rc2.
>>> 2. Compile with "CONFIG_COSCHEDULING=y".
>>> 3. Boot into the newly built kernel with an additional kernel command line
>>> argument "cosched_max_level=1" to enable coscheduling up to core-level.
>>> 4. Create one or more cgroups and set their "cpu.scheduled" to "1".
>>> 5. Put tasks into the created cgroups and set their affinity explicitly.
>>> 6. Enjoy tasks of the same group and on the same core executing
>>> simultaneously, whenever they are executed.
>>>
>>> You are not restricted to coscheduling at core-level. Just select higher
>>> numbers in steps 3 and 4. See also further below for more information, esp.
>>> when you want to try higher numbers on larger systems.
>>>
>>> Setting affinity explicitly for tasks within coscheduled cgroups is
>>> currently necessary, as the load balancing portion is still missing in this
>>> series.
>>>
>> I don't get the affinity part. If I create two cgroups by giving them only
>> cpu shares (no cpuset) and set their cpu.scheduled=1, will this ensure
>> co-scheduling of each group on core level for all cores in the system?
> Short answer: Yes. But ignoring the affinity part will very likely result in
> a poor experience with this patch set.
>
>
> I was referring to the CPU affinity of a task, that you can set via
> sched_setaffinity() from within a program or via taskset from the command
> line. For each task/thread within a cgroup, you should set the affinity to
> exactly one CPU. Otherwise -- as the load balancing part is still missing --
> you might end up with all tasks running on one CPU or some other unfortunate
> load distribution.
>
> Coscheduling itself does not care about the load, so each group will be
> (co-)scheduled at core level, no matter where the tasks ended up.
>
> Regards
> Jan
>
> PS: Below is an example to illustrate the resulting schedules a bit better,
> and what might happen, if you don't bind the to-be-coscheduled tasks to
> individual CPUs.
>
>
>
> For example, consider a dual-core system with SMT (i.e. 4 CPUs in total),
> two task groups A and B, and tasks within them a0, a1, .. and b0, b1, ..
> respectively.
>
> Let the system topology look like this:
>
> System (level 2)
> / \
> Core 0 Core 1 (level 1)
> / \ / \
> CPU0 CPU1 CPU2 CPU3 (level 0)
>
>
> If you set cpu.scheduled=1 for A and B, each core will be coscheduled
> independently, if there are tasks of A or B on the core. Assuming there
> are runnable tasks in A and B and some other tasks on a core, you will
> see a schedule like:
>
> A -> B -> other tasks -> A -> B -> other tasks -> ...
>
> (or some permutation thereof) happen synchronously across both CPUs
> of a core -- with no guarantees which tasks within A/within B/
> within the other tasks will execute simultaneously -- and with no
> guarantee what will execute on the other two CPUs simultaneously. (The
> distribution of CPU time between A, B, and other tasks follows the usual
> CFS weight proportional distribution, just at core level.) If neither
> CPU of a core has any runnable tasks of a certain group, it won't be part
> of the schedule (e.g., A -> other -> A -> other).
>
> With cpu.scheduled=2, you lift this schedule to system-level and you would
> see it happen across all four CPUs synchronously. With cpu.scheduled=0, you
> get this schedule at CPU-level as we're all used to with no synchronization
> between CPUs. (It gets a tad more interesting, when you start mixing groups
> with cpu.scheduled=1 and =2.)
>
>
> Here are some schedules, that you might see, with A and B coscheduled at
> core level (and that can be enforced this way (along the horizontal dimension)
> by setting the affinity of tasks; without setting the affinity, it could be
> any of them):
>
> Tasks equally distributed within A and B:
>
> t CPU0 CPU1 CPU2 CPU3
> 0 a0 a1 b2 b3
> 1 a0 a1 other other
> 2 b0 b1 other other
> 3 b0 b1 a2 a3
> 4 other other a2 a3
> 5 other other b2 b3
>
> All tasks within A and B on one CPU:
>
> t CPU0 CPU1 CPU2 CPU3
> 0 a0 -- other other
> 1 a1 -- other other
> 2 b0 -- other other
> 3 b1 -- other other
> 4 other other other other
> 5 a2 -- other other
> 6 a3 -- other other
> 7 b2 -- other other
> 8 b3 -- other other
>
> Tasks within a group equally distributed across one core:
>
> t CPU0 CPU1 CPU2 CPU3
> 0 a0 a2 b1 b3
> 1 a0 a3 other other
> 2 a1 a3 other other
> 3 a1 a2 b0 b3
> 4 other other b0 b2
> 5 other other b1 b2
>
> You will never see an A-task sharing a core with a B-task at any point in time
> (except for the 2 microseconds or so, that the collective context switch takes).
>
Ok got it. Can we have a more generic interface, like specifying a set of
task ids to be co-scheduled with a particular level rather than tying this
with cgroups? KVMs may not always run with cgroups and there might be other
use cases where we might want co-scheduling that doesn't relate to cgroups.
Powered by blists - more mailing lists