lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOesGMjU0tjJwAqCADaAv6XrCGbjB8G2oT=4LxOgSQBHO7Gptw@mail.gmail.com>
Date:   Wed, 23 Jan 2019 13:52:15 -0800
From:   Olof Johansson <olof@...om.net>
To:     Oded Gabbay <oded.gabbay@...il.com>,
        Dave Airlie <airlied@...hat.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ogabbay@...ana.ai, Arnd Bergmann <arnd@...db.de>,
        fbarrat@...ux.ibm.com, andrew.donnellan@....ibm.com
Subject: Re: [PATCH 00/15] Habana Labs kernel driver

Hi,

On Tue, Jan 22, 2019 at 4:01 PM Oded Gabbay <oded.gabbay@...il.com> wrote:
>
> Hello,
>
> For those who don't know me, my name is Oded Gabbay (Kernel Maintainer
> for AMD's amdkfd driver, worked at RedHat's Desktop group) and I work at
> Habana Labs since its inception two and a half years ago.
>
> Habana is a leading startup in the emerging AI processor space and we have
> already started production of our first Goya inference processor PCIe card
> and delivered it to customers. The Goya processor silicon has been tested
> since June of 2018 and is production-qualified by now. The Gaudi training
> processor solution is slated to sample in the second quarter of 2019.
>
> This patch-set contains the kernel driver for Habana's AI Processors
> (AIP) that are designed to accelerate Deep Learning inference and training
> workloads. The current version supports only the Goya processor and
> support for Gaudi will be upstreamed after the ASIC will be available to
> customers.
[...]

As others have mentioned, thanks for the amount of background and
information in this patch set, it's great to see.

Some have pointed out style and formatting issues, I'm not going to do
that here but I do have some higher-level comments:

 - There's a whole bunch of register definition headers. Outside of
GPUs, traditionally we don't include the full sets unless they're
needed in the driver since they tend to be very verbose.
 - I see a good amount of HW setup code that's mostly just writing
hardcoded values to a large number of registers. I don't have any
specific recommendation on how to do it better, but doing as much as
possible of this through on-device firmware tends to be a little
cleaner (or rather, hides it from the kernel. :). I don't know if that
fits your design though.
 - Are there any pointers to the userspace pieces that are used to run
on this card, or any kind of test suites that can be used when someone
has the hardware and is looking to change the driver?

But, I think the largest question I have (for a broader audience) is:

I predict that we will see a handful of these kind of devices over the
upcoming future -- definitely from ML accelerators but maybe also for
other kinds of processing, where there's a command-based, buffer-based
setup sending workloads to an offload engine and getting results back.
While the first waves will all look different due to design trade-offs
made in isolation, I think it makes sense to group them in one bucket
instead of merging them through drivers/misc, if nothing else to
encourage more cross-collaboration over time. First steps in figuring
out long-term suitable frameworks is to get a survey of a few
non-shared implementations.

So, I'd like to propose a drivers/accel drivers subtree, and I'd be
happy to bootstrap it with a small group (@Dave Airlie: I think your
input from GPU land be very useful, want to join in?). Individual
drivers maintained by existing maintainers, of course.

I think it might make sense to move the CAPI/OpenCAPI drivers over as
well -- not necessarily to change those drivers, but to group them
with the rest as more show up.


-Olof



-Olof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ