linux-kernel - Re: New subsystem for acceleration devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFCwf11TPKTF_Ndi60FneWp5g3SoawJvfJoKVWJ-QjxjpawMmg@mail.gmail.com>
Date:   Sun, 7 Aug 2022 09:50:35 +0300
From:   Oded Gabbay <oded.gabbay@...il.com>
To:     Dave Airlie <airlied@...il.com>
Cc:     dri-devel <dri-devel@...ts.freedesktop.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Yuji Ishikawa <yuji2.ishikawa@...hiba.co.jp>,
        Jiho Chu <jiho.chu@...sung.com>, Arnd Bergmann <arnd@...db.de>,
        "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: New subsystem for acceleration devices

On Fri, Aug 5, 2022 at 6:03 AM Dave Airlie <airlied@...il.com> wrote:
>
> On Thu, 4 Aug 2022 at 17:44, Oded Gabbay <oded.gabbay@...il.com> wrote:
> >
> > On Thu, Aug 4, 2022 at 2:54 AM Dave Airlie <airlied@...il.com> wrote:
> > >
> > > On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@...il.com> wrote:
> > > >
> > > > On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@...il.com> wrote:
> > > > >
> > > > > On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@...il.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > > Greg and I talked a couple of months ago about preparing a new accel
> > > > > > subsystem for compute/acceleration devices that are not GPUs and I
> > > > > > think your drivers that you are now trying to upstream fit it as well.
> > > > >
> > > > > We've had some submissions for not-GPUs to the drm subsystem recently.
> > > > >
> > > > > Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit.
> > > > >
> > > > > why is creating a new subsystem at this time necessary?
> > > > >
> > > > > Are we just creating a subsystem to avoid the open source userspace
> > > > > consumer rules? Or do we have some concrete reasoning behind it?
> > > > >
> > > > > Dave.
> > > >
> > > > Hi Dave.
> > > > The reason it happened now is because I saw two drivers, which are
> > > > doing h/w acceleration for AI, trying to be accepted to the misc
> > > > subsystem.
> > > > Add to that the fact I talked with Greg a couple of months ago about
> > > > doing a subsystem for any compute accelerators, which he was positive
> > > > about, I thought it is a good opportunity to finally do it.
> > > >
> > > > I also honestly think that I can contribute much to these drivers from
> > > > my experience with the habana driver (which is now deployed in mass at
> > > > AWS) and contribute code from the habana driver to a common framework
> > > > for AI drivers.
> > >
> > > Why not port the habana driver to drm now instead? I don't get why it
> > > wouldn't make sense?
> >
> > imho, no, I don't see the upside. This is not a trivial change, and
> > will require a large effort. What will it give me that I need and I
> > don't have now ?
>
> The opportunity for review, code sharing, experience of locking
> hierarchy, mm integration?
>
> IMHO The biggest thing that drm has is the community of people who
> understand accelerators, memory management, userspace command
> submissions, fencing, dma-buf etc.
>
> It's hard to have input to those discussions from the outside, and
> they are always ongoing.
>
> I think one of the Intel teams reported dropping a lot of code on
> their drm port due to stuff already being there, I'd expect the same
> for you.
>
> The opposite question is also valid, what does moving to a new
> subsystem help you or others, when there is one with all the
> infrastructure and more importantly reviewers.
>
> I'd be happy to have accelerator submaintainers, I'd be happy to even
> create an ACCELERATOR property for non-gpu drivers, so they can opt
> out of some of the GPUier stuff, like multiple device node users etc,
> or even create a new class of device nodes under /dev/dri.
>
I'm taking all what you wrote seriously, these are all good points.
As I wrote to Jason, I don't want to jump the gun here. I think we
should discuss this and explore the possibilities that you suggested
because I would like to reach consensus if possible.
Maybe this is something we can discuss in LPC or in the kernel summit ?

Oded

>
> > I totally agree. We need to set some rules and make sure everyone in
> > the kernel community is familiar with them, because now you get
> > different answers based on who you consult with.
> >
> > My rules of thumb that I thought of was that if you don't have any
> > display (you don't need to support X/wayland) and you don't need to
> > support opengl/vulkan/opencl/directx or any other gpu-related software
> > stack, then you don't have to go through drm.
> > In other words, if you don't have gpu-specific h/w and/or you don't
> > need gpu uAPI, you don't belong in drm.
>
> What happens when NVIDIA submit a driver for just compute or intel?
> for what is actually a GPU?
> This has been suggested as workaround for our userspace rules a few times.
>
> If my GPU can do compute tasks, do I have to add an accelerator
> subsystem driver alongside my GPU one?
>
> > After all, memory management services, or common device chars handling
> > I can get from other subsystems (e.g. rdma) as well. I'm sure I could
> > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI
> > there as well), but this doesn't mean I belong there, right ?
>
> Good point, but I think accelerators do mostly belong in drm or media,
> because there is enough framework around them to allow them to work,
> without reinventing everything.
>
> > >
> > > I think the one area I can see a divide where a new subsystem is for
> > > accelerators that are single-user, one shot type things like media
> > > drivers (though maybe they could be just media drivers).
> > >
> > > I think anything that does command offloading to firmware or queues
> > > belongs in drm, because that is pretty much what the framework does. I
> > I think this is a very broad statement which doesn't reflect reality
> > in the kernel.
>
> I think the habanalabs driver is one of the only ones that is outside
> this really, and in major use. There might be one or two other minor
> drivers with no real users.
>
> Dave.