linux-kernel - Re: [RFC PATCH 00/16] Core scheduling v6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200809164408.GA342447@google.com>
Date:   Sun, 9 Aug 2020 12:44:08 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc:     viremana@...ux.microsoft.com,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Glexiner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vineeth Pillai <vineethrp@...il.com>,
        Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        "Ning, Hongyu" <hongyu.ning@...ux.intel.com>,
        benbjiang(蒋彪) <benbjiang@...cent.com>
Subject: Re: [RFC PATCH 00/16] Core scheduling v6

Hi Aubrey,

Apologies for replying late as I was still looking into the details.

On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
[...]
> +/*
> + * Core scheduling policy:
> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> + *                     on the same core concurrently.
> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
> 			thread on the same core concurrently. 
> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> + *                     with idle thread on the same core.
> + */
> +enum coresched_policy {
> +       CORE_SCHED_DISABLED,
> +       CORE_SCHED_COOKIE_MATCH,
> +	CORE_SCHED_COOKIE_TRUST,
> +       CORE_SCHED_COOKIE_LONELY,
> +};
> 
> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> of performance regression. Not sure if this sounds attractive?

Instead of this, I think it can be something simpler IMHO:

1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
   core-scheduling patchset, such tasks will share a core and sniff on each
   other. So let us not pretend that such tasks are not trusted).

2. All kernel threads and idle task would have a cookie 0 (so that will cover
   ksoftirqd reported in your original issue).

3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
   enable it. Setting this option would tag all tasks that are forked from a
   cookie-0 task with their own cookie. Later on, such tasks can be added to
   a group. This cover's PeterZ's ask about having 'default untrusted').
   (Users like ChromeOS that don't want to userspace system processes to be
   tagged can disable this option so such tasks will be cookie-0).

4. Allow prctl/cgroup interfaces to create groups of tasks and override the
   above behaviors.

5. Document everything clearly so the semantics are clear both to the
   developers of core scheduling and to system administrators.

Note that, with the concept of "system trusted cookie", we can also do
optimizations like:
1. Disable STIBP when switching into trusted tasks.
2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching into
   trusted tasks.

At least #1 seems to be biting enabling HT on ChromeOS right now, and one
other engineer requested I do something like #2 already.

Once we get full-syscall isolation working, threads belonging to a process
can also share a core so those can just share a core with the task-group
leader.

> > Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> 
> This is a good question, from the data we measured by uperf,
> SMT+core-scheduling is 28.2% worse than no-SMT, :(

This is worrying for sure. :-(. We ought to debug/profile it more to see what
is causing the overhead. Me/Vineeth added it as a topic for LPC as well.

Any other thoughts from others on this?

thanks,

 - Joel

> > thanks,
> > 
> >  - Joel
> > PS: I am planning to write a patch behind a CONFIG option that tags
> > all processes (default untrusted) so everything gets a cookie which
> > some folks said was how they wanted (have a whitelist instead of
> > blacklist).
> > 
>