[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01591ec1-7c87-514c-047b-bec1c45ddfa5@baylibre.com>
Date: Fri, 26 May 2023 17:45:47 +0200
From: Alexandre Bailon <abailon@...libre.com>
To: Oded Gabbay <ogabbay@...nel.org>,
Kevin Hilman <khilman@...libre.com>,
Jeffrey Hugo <quic_jhugo@...cinc.com>, airlied@...il.com,
maarten.lankhorst@...ux.intel.com, mripard@...nel.org,
tzimmermann@...e.de, devicetree@...r.kernel.org,
conor+dt@...nel.org, bero@...libre.com, jstephan@...libre.com,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
christian.koenig@....com, linaro-mm-sig@...ts.linaro.org,
robh+dt@...nel.org, linux-mediatek@...ts.infradead.org,
nbelin@...libre.com, krzysztof.kozlowski+dt@...aro.org,
matthias.bgg@...il.com, linux-media@...r.kernel.org,
sumit.semwal@...aro.org, linux-arm-kernel@...ts.infradead.org,
angelogioacchino.delregno@...labora.com
Subject: Re: [PATCH 0/7] Add a DRM driver to support AI Processing Unit (APU)
On 5/24/23 12:40, Daniel Vetter wrote:
> On Wed, May 24, 2023 at 01:27:00PM +0300, Oded Gabbay wrote:
>> On Wed, May 24, 2023 at 2:34 AM Kevin Hilman <khilman@...libre.com> wrote:
>>>
>>> Jeffrey Hugo <quic_jhugo@...cinc.com> writes:
>>>
>>>> On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
>>>>> This adds a DRM driver that implements communication between the CPU and an
>>>>> APU. The driver target embedded device that usually run inference using some
>>>>> prebuilt models. The goal is to provide common infrastructure that could be
>>>>> re-used to support many accelerators. Both kernel, userspace and firmware tries
>>>>> to use standard and existing to leverage the development and maintenance effort.
>>>>> The series implements two platform drivers, one for simulation and another one for
>>>>> the mt8183 (compatible with mt8365).
>>>>
>>>> This looks like the 3 existing Accel drivers. Why is this in DRM?
>>>
>>> Yes, this belongs in accel. I think Alex had some issues around the
>>> infra in accel with device nodes not appearing/opening properly, but
>>> I'll let him comment there. But either way, the right approach should
>>> be to fix any issues in accel and move it there.
>>>
>>> [...]
>>>
>>>>> .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++
>>>>> drivers/gpu/drm/Kconfig | 2 +
>>>>> drivers/gpu/drm/Makefile | 1 +
>>>>> drivers/gpu/drm/apu/Kconfig | 22 +
>>>>> drivers/gpu/drm/apu/Makefile | 10 +
>>>>> drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++
>>>>> drivers/gpu/drm/apu/apu_gem.c | 230 +++++++
>>>>> drivers/gpu/drm/apu/apu_internal.h | 205 ++++++
>>>>> drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++
>>>>> drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++
>>>>> include/uapi/drm/apu_drm.h | 81 +++
>>>>
>>>> "apu" seems too generic. We already have 3 "AI processing units" over
>>>> in drivers/accel already...
>>>
>>> Indeed, it is generic, but that's kind of the point for this driver
>>> since it's targetted at generalizing the interface with "AI processing
>>> units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In
>>> addition, the generic naming is intentional because the goal is bigger
>>> than the kernel and is working towards a generic, shared "libAPU"
>>> userspace[1], but also common firmware for DSP-style inference engines
>>> (e.g. analgous Sound Open Firmware for audio DSPs.)
>>>
>>> As usual, the various SoC vendors use different names (APU, NPU, NN
>>> unit, etc.) but we'd like a generic name for the class of devices
>>> targetted by this driver. And unfortunately, it looks like the equally
>>> generic "Versatile processing unit" is already taken Intel's
>>> drivers/accel/ivpu. :)
>>>
>>> Maybe since this is more about generalizing the interface between the
>>> CPU running linux and the APU, what about the name apu_if? But I guess
>>> that applies to the other 3 drivers in drivers/accell also. Hmmm...
>>>
>>> Naming things is hard[2], so we're definitly open to other ideas. Any
>>> suggestions?
>> Maybe model it according to the tiny driver in drm display ? You can
>> then call it tiny_apu :-)
>> Disclosure: It was Daniel's suggestion, he can chime in with more
>> details on the tiny driver concept.
>
> Yeah so maybe a bit more detail on my thoughts:
>
> First this smells like a need bypass of the entire "we want open userspace
> for accel drivers" rule. The rule isn't quite a strict as for drm gpu
> drivers (not sure we ended up documenting exactly what, but iirc the
> consensus was that for build-time only dependencies we're ok with
> downstream compilers), but it's still there.
What is letting you think that we want to bypass open source requirements ?
Although the neural network firmware and userspace application are not yet
opensource, our intention is to develop a full open source stack.
Currently, we only support Mediatek APU (an Xtensa VP6) and we have to
use closed source sotfware to execute inferences on the accelerator.
As far I know, there software stack similar to mesa where we could add
support of a new accelerator (this is also true for firmware).
That is actually what we would like to do. But this will take a lot of
time and we consider this driver as a first (small) step.
>
> And at least from a quick look apu.ko and libapu just look like a generic
> accel interface, and that's not enough.
>
> For the big training engines it's more or less "enough to run pytorch, but
> it can be really slow", not sure what the right standard for these
> inference-only drivers should be.
To be honest, I don't know what would be required for training engines.
We only target accelerators for embedded device that usually only run
inferences. In my opinion, this is 2 different use cases and I don't
think we could address them in the same way.
>
> So that's the first reason why I don't like this.
>
> The other is that I think if we do end up with a pile of tiny accel
> drivers, we should probably look into something like simmpledrm for the
> tiny display drivers. Probably still IP specific ioctls (at least most) so
> that IP specific job knows and all that are easy, but then just pass to a
> framework that simplifies a drm gem driver to "write ptes" and "run job"
> callback, maybe with an optional "create/destroy vm/ctx" for hw which can
> do that.
>
> So maybe we end up with a drivers/accel/tiny and a bunch more helpers
> around the existing gem ones. The rule we have for drm/tiny is "1 file,
> less than 1kloc", and there's a bunch of them. I do think we can achieve
> the same for tiny accel inference engines (but it's still a bit a road).
> Maybe tiny accel is more like "less than 5kloc" since you need a bit more
> glue for the driver specific ioctl stuff - maybe that's only needed for
> the submit ioctl, maybe also for buffer map/unmap and creation.
This makes sense to me.
>
> Also note that there's an entire pile of in-flight work for adding new
> helpers to the gem world to make this all easier. Once we have gpuva and
> exec helpers there not much glue left to tie it all together with the
> scheduler.
I wrote this series a long time ago and just rebased it recently.
I will take some time to see the in-flight work and see if that
something I could start using.
>
> But the real crux is that an accel inference driver really needs to have
> enough userspace to do an actual inference job with some
> android/cros/whatever framework for inference (there's just too many).
We are currently stuck with closed source fimrware, userspace
applications and toolchains (works with android and linux).
We are looking for a solution but implementing something will take some
time.
Alexandre
> -Daniel
>
>> Oded
>>
>>>
>>> Kevin
>>>
>>> [1] https://gitlab.baylibre.com/baylibre/libapu/libapu
>>>
>>> [2]
>>> "There are 2 hard problems in computer science: cache invalidation,
>>> naming things and off-by-1 errors."
>>> -- https://twitter.com/secretGeek/status/7269997868
>>>
>
Powered by blists - more mailing lists