lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7hh71uixd9.fsf@baylibre.com>
Date:   Mon, 29 Aug 2022 13:54:42 -0700
From:   Kevin Hilman <khilman@...libre.com>
To:     Oded Gabbay <oded.gabbay@...il.com>
Cc:     Dave Airlie <airlied@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Yuji Ishikawa <yuji2.ishikawa@...hiba.co.jp>,
        Jiho Chu <jiho.chu@...sung.com>,
        Alexandre Bailon <abailon@...libre.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Arnd Bergmann <arnd@...db.de>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        "Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: New subsystem for acceleration devices

Hi Oded (and sorry I misspelled your name last time),

Oded Gabbay <oded.gabbay@...il.com> writes:

> On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@...libre.com> wrote:
>>
>> Hi Obed,
>>
>> Oded Gabbay <oded.gabbay@...il.com> writes:
>>
>> [...]
>>
>> > I want to update that I'm currently in discussions with Dave to figure
>> > out what's the best way to move forward. We are writing it down to do
>> > a proper comparison between the two paths (new accel subsystem or
>> > using drm). I guess it will take a week or so.
>>
>> Any update on the discussions with Dave? and/or are there any plans to
>> discuss this further at LPC/ksummit yet?
> Hi Kevin.
>
> We are still discussing the details, as at least the habanalabs driver
> is very complex and there are multiple parts that I need to see if and
> how they can be mapped to drm.
> Some of us will attend LPC so we will probably take advantage of that
> to talk more about this.

OK, looking forward to some more conversations at LPC.

>>
>> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are
>> using the DRM-based approach.  I'll also be at LPC and happy to discuss
>> in person.
>>
>> For some context on my/our interest: back in Sept 2020 we initially
>> submitted an rpmesg based driver for kernel communication[1].  After
>> review comments, we rewrote that based on DRM[2] and are now using it
>> for some MTK SoCs[3] and supporting our MTK customers with it.
>>
>> Hopefully we will get the kernel interfaces sorted out soon, but next,
>> there's the userspace side of things.  To that end, we're also working
>> on libAPU, a common, open userspace stack.  Alex Bailon recently
>> presented a proposal earlier this year at Embedded Recipes in Paris
>> (video[4], slides[5].)
>>
>> libAPU would include abstractions of the kernel interfaces for DRM
>> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and
>> proposes an open firmware for the accelerator side using
>> libMetal/OpenAMP + rpmsg for communication with (most likely closed
>> source) vendor firmware.  Think of this like sound open firmware (SOF[6]),
>> but for accelerators.
>
> I think your device and the habana device are very different in
> nature, and it is part of what Dave and I discussed, whether these two
> classes of devices can live together. I guess they can live together
> in the kernel, but in the userspace, not so much imo.

Yeah, for now I think focusing on how to handle both classes of devices
in the kernel is the most important.

> The first class is the edge inference devices (usually as part of some
> SoC). I think your description of the APU on MTK SoC is a classic
> example of such a device.

Correct.

> You usually have some firmware you load, you give it a graph and
> pointers for input and output and then you just execute the graph
> again and again to perform inference and just replace the inputs.
>
> The second class is the data-center, training accelerators, which
> habana's gaudi device is classified as such. These devices usually
> have a number of different compute engines, a fabric for scaling out,
> on-device memory, internal MMUs and RAS monitoring requirements. Those
> devices are usually operated via command queues, either through their
> kernel driver or directly from user-space. They have multiple APIs for
> memory management, RAS, scaling-out and command-submissions.

OK, I see.

>>
>> We've been using this succesfully for Mediatek SoCs (which have a
>> Cadence VP6 APU) and have submitted/published the code, including the
>> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already
>> mentioned.
> What's the difference between libmetal and other open-source low-level
> runtime drivers, such as oneAPI level-zero ?

TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly
focused in the data center.  libmetal/openAMP are widely used
in the consumer, industrial embedded space, and heavily used by FPGAs in
many market segments.

> Currently we have our own runtime driver which is tightly coupled with
> our h/w. For example, the method the userspace "talks" to the
> data-plane firmware is very proprietary as it is hard-wired into the
> architecture of the entire ASIC and how it performs deep-learning
> training. Therefore, I don't see how this can be shared with other
> vendors. Not because of secrecy but because it is simply not relevant
> to any other ASIC.

OK, makes sense.

Thanks for clarifying your use case in more detail.

Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ