linux-kernel - Re: [RFC 0/5] kernel: Introduce CPU Namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211011141737.GA58758@blackbody.suse.cz>
Date:   Mon, 11 Oct 2021 16:17:37 +0200
From:   Michal Koutný <mkoutny@...e.com>
To:     Christian Brauner <christian.brauner@...ntu.com>
Cc:     "Pratik R. Sampat" <psampat@...ux.ibm.com>, bristot@...hat.com,
        christian@...uner.io, ebiederm@...ssion.com,
        lizefan.x@...edance.com, tj@...nel.org, hannes@...xchg.org,
        mingo@...nel.org, juri.lelli@...hat.com,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        cgroups@...r.kernel.org, containers@...ts.linux.dev,
        containers@...ts.linux-foundation.org, pratik.r.sampat@...il.com
Subject: Re: [RFC 0/5] kernel: Introduce CPU Namespace

On Mon, Oct 11, 2021 at 12:11:24PM +0200, Christian Brauner <christian.brauner@...ntu.com> wrote:
> Fundamentally I think making this a new namespace is not the correct
> approach.

I tend to agree. 

Also, generally, this is not only problem of cpuset but some other
controllers well (the original letter mentions CPU bandwidth limits, another
thing are memory limits (and I wonder whether some apps already adjust their
behavior to available IO characteristics)).

The problem as I see it is the mapping from a real dedicated HW to a
cgroup restricted environment ("container"), which can be shared. In
this instance, the virtualized view would not be able to represent a
situation when a CPU is assigned non-exclusively to multiple cpusets.

(Although, one speciality of the CPU namespace approach here is the
remapping/scrambling of the CPU topology. Not sure if good or bad.)

> I think that either we need to come up with new non-syscall based
> interfaces that allow to query virtualized cpu information and buy into
> the process of teaching userspace about them. This is even independent
> of containers.

For the reason above, I also agree with this. And I think this interface
(mostly) exists -- the userspace could query the cgroup files
(cpuset.cpus.effective in this case), they would even have the liberty
to decide between querying available resources in their "container"
(root cgroup (cgroup NS)) or further subdivision of that (the
immediately encompassing cgroup).

On Sat, Oct 09, 2021 at 08:42:38PM +0530, "Pratik R. Sampat" <psampat@...ux.ibm.com> wrote:
> Existing solutions to the problem include userspace tools like LXCFS
> which can fake the sysfs information by mounting onto the sysfs online
> file to be in coherence with the limits set through cgroup cpuset.
> However, LXCFS is an external solution and needs to be explicitly setup
> for applications that require it. Another concern is also that tools
> like LXCFS don't handle all the other display mechanism like procfs load
> stats.
>
> Therefore, the need of a clean interface could be advocated for.

I'd like to write something in support of your approach but I'm afraid that the
problem of the mapping (dedicated vs shared) makes this most suitable for some
external/separate entity such as the LCXFS already.

My .02€,
Michal