netdev - Re: [RFC] Add BPF_PROG_TYPE_CGROUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YJXRHXIykyEBdnTF@slm.duckdns.org>
Date:   Fri, 7 May 2021 19:45:33 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Alex Deucher <alexdeucher@...il.com>
Cc:     Daniel Vetter <daniel@...ll.ch>, Kenny Ho <y2kenny@...il.com>,
        Song Liu <songliubraving@...com>,
        Andrii Nakryiko <andriin@...com>,
        DRI Development <dri-devel@...ts.freedesktop.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kenny Ho <Kenny.Ho@....com>,
        "open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
        Brian Welty <brian.welty@...el.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        Martin KaFai Lau <kafai@...com>,
        Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Network Development <netdev@...r.kernel.org>,
        KP Singh <kpsingh@...omium.org>, Yonghong Song <yhs@...com>,
        bpf <bpf@...r.kernel.org>, Dave Airlie <airlied@...il.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Alex Deucher <alexander.deucher@....com>
Subject: Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

Hello,

On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote:
> Maybe we are speaking past each other.  I'm not following.  We got
> here because a device specific cgroup didn't make sense.  With my
> Linux user hat on, that makes sense.  I don't want to write code to a
> bunch of device specific interfaces if I can avoid it.  But as for
> temporal vs spatial partitioning of the GPU, the argument seems to be
> a sort of hand-wavy one that both spatial and temporal partitioning
> make sense on CPUs, but only temporal partitioning makes sense on
> GPUs.  I'm trying to understand that assertion.  There are some GPUs

Spatial partitioning as implemented in cpuset isn't a desirable model. It's
there partly because it has historically been there. It doesn't really
require dynamic hierarchical distribution of anything and is more of a way
to batch-update per-task configuration, which is how it's actually
implemented. It's broken too in that it interferes with per-task affinity
settings. So, not exactly a good example to follow. In addition, this sort
of partitioning requires more hardware knowledge and GPUs are worse than
CPUs in that hardwares differ more.

Features like this are trivial to implement from userland side by making
per-process settings inheritable and restricting who can update the
settings.

> that can more easily be temporally partitioned and some that can be
> more easily spatially partitioned.  It doesn't seem any different than
> CPUs.

Right, it doesn't really matter how the resource is distributed. What
matters is how granular and generic the distribution can be. If gpus can
implement work-conserving proportional distribution, that's something which
is widely useful and inherently requires dynamic scheduling from kernel
side. If it's about setting per-vendor affinities, this is way too much
cgroup interface for a feature which can be easily implemented outside
cgroup. Just do per-process (or whatever handles gpus use) and confine their
configurations from cgroup side however way.

While the specific theme changes a bit, we're basically having the same
discussion with the same conclusion over the past however many months.
Hopefully, the point is clear by now.

Thanks.

-- 
tejun