lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d25f0e8-9894-386e-7669-9ecbc176bd5b@oracle.com>
Date:   Mon, 24 Aug 2020 16:53:45 -0400
From:   chris hyser <chris.hyser@...cle.com>
To:     Joel Fernandes <joel@...lfernandes.org>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        JulienDesfossez@...gle.com, jdesfossez@...italocean.com,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, linux-kernel@...r.kernel.org,
        fweisbec@...il.com, keescook@...omium.org,
        Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>, vineethrp@...il.com,
        Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        dhaval.giani@...il.com, paulmck@...nel.org, joshdon@...gle.com,
        xii@...gle.com, haoluo@...gle.com, bsegall@...gle.com
Subject: Re: [RFC] Design proposal for upstream core-scheduling interface

On 8/21/20 11:01 PM, Joel Fernandes wrote:
> Hello!
> Core-scheduling aims to allow making it safe for more than 1 task that trust
> each other to safely share hyperthreads within a CPU core [1]. This results
> in a performance improvement for workloads that can benefit from using
> hyperthreading safely while limiting core-sharing when it is not safe.
> 
> Currently no universally agreed set of interface exists and companies have
> been hacking up their own interface to make use of the patches. This post
> aims to list usecases which I got after talking to various people at Google
> and Oracle. After which actual development of code to add interfaces can follow.
> 
> The below text uses the terms cookie and tag interchangeably. Further, cookie
> of 0 is assumed to indicate a trusted process - such as kernel threads or
> system daemons. By default, if nothing is tagged then everything is
> considered trusted since the scheduler assumes all tasks are a match for each
> other.
> 
> Usecase 1: Google's cloud group tags CGroups with a 32-bit integer. This
> int32 is split into 2 parts, the color and the id. The color can only be set
> by privileged processes and the id can be set by anyone. The CGroup structure
> looks like:
> 
>     A         B
>    / \      / \ \
>   C   D    E  F  G
> 
> Here A and B are container CGroups for 2 jobs are assigned a color by a
> privileged daemon. The job itself has more sub-CGroups within (for ex, B has
> E, F and G). When these sub-CGroups are spawned, they inherit the color from
> the parent. An unprivileged user can then set an id for the sub-CGroup
> without the knowledge of the privileged daemon if it desires to add further
> isolation. This setting of id can be an unprivileged operation because the
> root daemon has already isolated A and B.
> 
> Usecase 2: Chrome browser - tagging renderers. In Chrome, each tab opened
> spawns a renderer. A renderer is a sandboxed process and it is assumed it
> could run arbitrary code (Javascript etc). When a renderer is created, a
> prctl call is made to tag the renderer. Every thread that is spawned by the
> renderer is also tagged. Essentially this turns SMT off for the renderer, but
> still gives a performance boost due to privileged system threads being able
> to share a core. The tagging also forbids the renderer from sharing a core
> with privileged system processes. In the future, we plan to allow threads to
> share a core as well (especially once we get syscall-isolation upstreamed.
> Patches were posted recently for the same [2]).
> 
> Usecase 3: ChromeOS VMs - each vCPU thread that is created by the VMM is
> tagged thus disallowing core sharing between the vCPU thread and any other
> thread on the system. This is because such VMs may run arbitrary user code
> and attack both the guest and the host systems sharing the core.
> 
> Usecase 4: Oracle - Setting a sub-CGroup as trusted (cookie 0). Chris Hyser
> talked to me on IRC that in a CGroup hierarcy, some CGroups should be allowed
> to not have to share its parent's CGroup tag. In fact, it should be allowed to
> untag the child CGroup if needed thus allowing them to share a core with
> trusted tasks. Others have had similar requirements.
> 
> Proposal for tagging
> --------------------
> We have to support both CGroup and non-CGroup users. CGroup may be overkill
> for some and the CGroup v2 unified hierarchy may be too inflexible.
> Regardless, we must support CGroup due its easy of use and existing users.
> 
> For Usecase #1
> ----------
> Usecase #1 requires a 2-level tagging mechanism. I propose 2 new files
> to the CPU controller:
> - tag : a boolean (0/1). If set, this CGroup and all sub-CGroups will be
>    tagged.  (In the kernel, the cookie will be derived from the pointer value
>    of a ref-counted cookie object.). If reset, then the CGroup will inherit
>    the parent CGroup's cookie if there is one.
> 
> - color : The ref-counted object will be aligned say to a 256-byte boundary
>    (for example), then the lower 8 bits of the pointer can be used to specify
>    color. Together, the pointer with the color will form a cookie used by the
>    scheduler.
> 
> Note that if 2 CGroups belong to 2 different tagged hierarchies, then setting
> their color to be the same does not imply that the 2 groups will share a
> core. This is key.  Also, to support usecase #4, we could add a third tag
> value -- 2, along with the usual 0 and 1 to suggest that the CGroup can share
> a core with cookie-0 tasks (Chris Hyser feel free to add any more comments
> here).

Let em think about this. This looks like it would support delegation of a cgroup subtree, which I suppose containers are 
going to want eventually. That seems to be the advantage over just allowing setting the entire cookie. Anyway, I look 
forward to tomorrow and thanks for putting this together.

-chrish



> For Usecase #2
> --------------
> We could add an interface that Peter suggested where 2 PIDs A and B want to
> share a core. So if A wants to share a core with B, then it issues
> prctl(SET_CORE_SHARE, B). ptrace_may_access() can be used to restrict access.
> For renderers though, we want to likely allow a renderer to share a core
> exclusive with only threads within a renderer and no one else. To support
> this, renderer A could simply issue prctl(SET_CORE_SHARE, A).
> 
> For Usecase #3
> --------------
> By default, all threads within a process will share a core. This makes the
> most sense because threads in a process share the same virtual address space.
> However, for virtual machines in ChromeOS, we would like vCPU threads to not
> share a core with other vCPU threads as mentioned above. To support this,
> when a vCPU thread is forked, a new clone flag - CLONE_NEW_CORE_TAG could be
> introduced to cause the forked thread to not share a core with its parent.
> This could also support usecase #2 in the future (instead of prctl, a new
> renderer being forked can simply be passed CLONE_NEW_CORE_TAG which will tag the
> forked process or thread even if the forking process is not tagged).
> 
> Other considerations:
> - To share a core anyway even if tags don't match: If we assume that the only
>    purpose of core-scheduling is to enforce security, then if the kernel knows
>    that CPUs are not vulnerable then cores can be shared anyway, whether the
>    tasks are tagged or not (Suggested-by PeterZ).
> 
> - Addition of a new CGroup controller: Instead of CPU controller, it may be
>    better to add a new CGroup controller in case the CPU controller is not
>    attached to some parts of the hierarchy and it is still desirable to use
>    CGroup interface for core tagging.
> 
> - Co-existence of CGroup with prctl/clone. The prctl/clone tagging should
>    always be made to override CGroup. For this purpose, I propose a new
>    'tasks_no_cg_tag' or a similar file in the CGroup controller. This file
>    will list all tasks that don't associate with the CGroup's tag. NOTE: I am not
>    sure yet how this new file will work with prctl/clone-tagging of individual
>    threads in a non-thread-mode CGroup v2 usage.
> 
> - Differences in tagging of a forked task (!CLONE_THREAD): If a process is
>    a part of a CGroup and is forked, then the child process is automatically
>    added to that CGroup. If such CGroup was tagged before, then the child is
>    automatically tagged. However, it may be desired to give the child its own
>    tag. In this case also, the earlier CLONE_NEW_CORE_TAG flag can be used to
>    achieve this behavior. If the forking process was not a part of a CGroup
>    but got a tag through other means before, then by default a !CLONE_THREAD
>    fork would imply CLONE_NEW_CORE_TAG. However, to turns this off, a
>    CLONE_CORE_TAG flag can be added (forking process's tag will be inheritted
>    by the child).
> 
> Let me know your thoughts and looking forward to a good LPC MC discussion!
> 
> thanks,
> 
>   - Joel
> 
> [1] https://lwn.net/Articles/780703/
> [2] https://lwn.net/Articles/828889/
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ