linux-kernel - Re: [RFC][PATCH 00/16] sched: Core scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJGSLMt_X97Ux=1YiZcXWXvBy4n=ExO=2yAJhfbvxDh+wnWPvQ@mail.gmail.com>
Date:   Tue, 19 Feb 2019 14:07:01 -0800
From:   Greg Kerr <kerrnel@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, tglx@...utronix.de, Paul Turner <pjt@...gle.com>,
        tim.c.chen@...ux.intel.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling

Thanks for posting this patchset Peter. Based on the patch titled, "sched: A
quick and dirty cgroup tagging interface," I believe cgroups are used to
define co-scheduling groups in this implementation.

Chrome OS engineers (kerrnel@...gle.com, mpdenton@...gle.com, and
palmer@...gle.com) are considering an interface that is usable by unprivileged
userspace apps. cgroups are a global resource that require privileged access.
Have you considered an interface that is akin to namespaces? Consider the
following strawperson API proposal (I understand prctl() is generally
used for process
specific actions, so we aren't married to using prctl()):

# API Properties

The kernel introduces coscheduling groups, which specify which processes may
be executed together. An unprivileged process may use prctl() to create a
coscheduling group. The process may then join the coscheduling group, and
place any of its child processes into the coscheduling group. To
provide flexibility for
unrelated processes to join pre-existing groups, an IPC mechanism could send a
coscheduling group handle between processes.

# Strawperson API Proposal
To create a new coscheduling group:
    int coscheduling_group = prctl(PR_CREATE_COSCHEDULING_GROUP);

The return value is >= 0 on success and -1 on failure, with the following
possible values for errno:

    ENOTSUP: This kernel doesn’t support the PR_NEW_COSCHEDULING_GROUP
operation.
    EMFILE: The process’ kernel-side coscheduling group table is full.

To join a given process to the group:
    pid_t process = /* self or child... */
    int status = prctl(PR_JOIN_COSCHEDULING_GROUP, coscheduling_group, process);
    if (status) {
        err(errno, NULL);
    }

The kernel will check and enforce that the given process ID really is the
caller’s own PID or a PID of one of the caller’s children, and that the given
group ID really exists. The return value is 0 on success and -1 on failure,
with the following possible values for errno:

    EPERM: The caller could not join the given process to the coscheduling
           group because it was not the creator of the given coscheduling group.
    EPERM: The caller could not join the given process to the coscheduling
           group because the given process was not the caller or one
of the caller’s
           children.
    EINVAL: The given group ID did not exist in the kernel-side coscheduling
            group table associated with the caller.
    ESRCH: The given process did not exist.

Regards,

Greg Kerr (kerrnel@...gle.com)

On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
>
> A much 'demanded' feature: core-scheduling :-(
>
> I still hate it with a passion, and that is part of why it took a little
> longer than 'promised'.
>
> While this one doesn't have all the 'features' of the previous (never
> published) version and isn't L1TF 'complete', I tend to like the structure
> better (relatively speaking: I hate it slightly less).
>
> This one is sched class agnostic and therefore, in principle, doesn't horribly
> wreck RT (in fact, RT could 'ab'use this by setting 'task->core_cookie = task'
> to force-idle siblings).
>
> Now, as hinted by that, there are semi sane reasons for actually having this.
> Various hardware features like Intel RDT - Memory Bandwidth Allocation, work
> per core (due to SMT fundamentally sharing caches) and therefore grouping
> related tasks on a core makes it more reliable.
>
> However; whichever way around you turn this cookie; it is expensive and nasty.
>
> It doesn't help that there are truly bonghit crazy proposals for using this out
> there, and I really hope to never see them in code.
>
> These patches are lightly tested and didn't insta explode, but no promises,
> they might just set your pets on fire.
>
> 'enjoy'
>
> @pjt; I know this isn't quite what we talked about, but this is where I ended
> up after I started typing. There's plenty design decisions to question and my
> changelogs don't even get close to beginning to cover them all. Feel free to ask.
>
> ---
>  include/linux/sched.h    |   9 +-
>  kernel/Kconfig.preempt   |   8 +-
>  kernel/sched/core.c      | 762 ++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/sched/deadline.c  |  99 +++---
>  kernel/sched/debug.c     |   4 +-
>  kernel/sched/fair.c      | 129 +++++---
>  kernel/sched/idle.c      |  42 ++-
>  kernel/sched/pelt.h      |   2 +-
>  kernel/sched/rt.c        |  96 +++---
>  kernel/sched/sched.h     | 183 ++++++++----
>  kernel/sched/stop_task.c |  35 ++-
>  kernel/sched/topology.c  |   4 +-
>  kernel/stop_machine.c    |   2 +
>  13 files changed, 1096 insertions(+), 279 deletions(-)
>
>