linux-kernel - Re: [RFC 00/60] Coscheduling for Linux

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a68891ae-468b-da35-61ee-a3136c6e64c1@oracle.com>
Date:   Wed, 19 Sep 2018 14:53:45 -0700
From:   Subhra Mazumdar <subhra.mazumdar@...cle.com>
To:     "Jan H. Schönherr" <jschoenh@...zon.de>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [RFC 00/60] Coscheduling for Linux



On 09/18/2018 04:44 AM, Jan H. Schönherr wrote:
> On 09/18/2018 02:33 AM, Subhra Mazumdar wrote:
>> On 09/07/2018 02:39 PM, Jan H. Schönherr wrote:
>>> A) Quickstart guide for the impatient.
>>> --------------------------------------
>>>
>>> Here is a quickstart guide to set up coscheduling at core-level for
>>> selected tasks on an SMT-capable system:
>>>
>>> 1. Apply the patch series to v4.19-rc2.
>>> 2. Compile with "CONFIG_COSCHEDULING=y".
>>> 3. Boot into the newly built kernel with an additional kernel command line
>>>      argument "cosched_max_level=1" to enable coscheduling up to core-level.
>>> 4. Create one or more cgroups and set their "cpu.scheduled" to "1".
>>> 5. Put tasks into the created cgroups and set their affinity explicitly.
>>> 6. Enjoy tasks of the same group and on the same core executing
>>>      simultaneously, whenever they are executed.
>>>
>>> You are not restricted to coscheduling at core-level. Just select higher
>>> numbers in steps 3 and 4. See also further below for more information, esp.
>>> when you want to try higher numbers on larger systems.
>>>
>>> Setting affinity explicitly for tasks within coscheduled cgroups is
>>> currently necessary, as the load balancing portion is still missing in this
>>> series.
>>>
>> I don't get the affinity part. If I create two cgroups by giving them only
>> cpu shares (no cpuset) and set their cpu.scheduled=1, will this ensure
>> co-scheduling of each group on core level for all cores in the system?
> Short answer: Yes. But ignoring the affinity part will very likely result in
>                a poor experience with this patch set.
>
>
> I was referring to the CPU affinity of a task, that you can set via
> sched_setaffinity() from within a program or via taskset from the command
> line. For each task/thread within a cgroup, you should set the affinity to
> exactly one CPU. Otherwise -- as the load balancing part is still missing --
> you might end up with all tasks running on one CPU or some other unfortunate
> load distribution.
>
> Coscheduling itself does not care about the load, so each group will be
> (co-)scheduled at core level, no matter where the tasks ended up.
>
> Regards
> Jan
>
> PS: Below is an example to illustrate the resulting schedules a bit better,
> and what might happen, if you don't bind the to-be-coscheduled tasks to
> individual CPUs.
>
>
>
> For example, consider a dual-core system with SMT (i.e. 4 CPUs in total),
> two task groups A and B, and tasks within them a0, a1, ..  and b0, b1, ..
> respectively.
>
> Let the system topology look like this:
>
>          System          (level 2)
>        /        \
>    Core 0      Core 1    (level 1)
>    /    \      /    \
> CPU0  CPU1  CPU2  CPU3  (level 0)
>
>
> If you set cpu.scheduled=1 for A and B, each core will be coscheduled
> independently, if there are tasks of A or B on the core. Assuming there
> are runnable tasks in A and B and some other tasks on a core, you will
> see a schedule like:
>
>    A -> B -> other tasks -> A -> B -> other tasks -> ...
>
> (or some permutation thereof) happen synchronously across both CPUs
> of a core -- with no guarantees which tasks within A/within B/
> within the other tasks will execute simultaneously -- and with no
> guarantee what will execute on the other two CPUs simultaneously. (The
> distribution of CPU time between A, B, and other tasks follows the usual
> CFS weight proportional distribution, just at core level.) If neither
> CPU of a core has any runnable tasks of a certain group, it won't be part
> of the schedule (e.g., A -> other -> A -> other).
>
> With cpu.scheduled=2, you lift this schedule to system-level and you would
> see it happen across all four CPUs synchronously. With cpu.scheduled=0, you
> get this schedule at CPU-level as we're all used to with no synchronization
> between CPUs. (It gets a tad more interesting, when you start mixing groups
> with cpu.scheduled=1 and =2.)
>
>
> Here are some schedules, that you might see, with A and B coscheduled at
> core level (and that can be enforced this way (along the horizontal dimension)
> by setting the affinity of tasks; without setting the affinity, it could be
> any of them):
>
> Tasks equally distributed within A and B:
>
> t   CPU0  CPU1  CPU2  CPU3
> 0    a0    a1    b2    b3
> 1    a0    a1   other other
> 2    b0    b1   other other
> 3    b0    b1    a2    a3
> 4   other other  a2    a3
> 5   other other  b2    b3
>
> All tasks within A and B on one CPU:
>
> t   CPU0  CPU1  CPU2  CPU3
> 0    a0    --   other other
> 1    a1    --   other other
> 2    b0    --   other other
> 3    b1    --   other other
> 4   other other other other
> 5    a2    --   other other
> 6    a3    --   other other
> 7    b2    --   other other
> 8    b3    --   other other
>
> Tasks within a group equally distributed across one core:
>
> t   CPU0  CPU1  CPU2  CPU3
> 0    a0    a2    b1    b3
> 1    a0    a3   other other
> 2    a1    a3   other other
> 3    a1    a2    b0    b3
> 4   other other  b0    b2
> 5   other other  b1    b2
>
> You will never see an A-task sharing a core with a B-task at any point in time
> (except for the 2 microseconds or so, that the collective context switch takes).
>
Ok got it. Can we have a more generic interface, like specifying a set of
task ids to be co-scheduled with a particular level rather than tying this
with cgroups? KVMs may not always run with cgroups and there might be other
use cases where we might want co-scheduling that doesn't relate to cgroups.