[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFCwf13JA+5vuAKqvBSs3MkcF-gbE_8vd9nSvStQga55vW80VA@mail.gmail.com>
Date: Mon, 25 Jul 2022 12:10:07 +0300
From: Oded Gabbay <ogabbay@...nel.org>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: Jiho Chu <jiho.chu@...sung.com>, Arnd Bergmann <arnd@...db.de>,
"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
yelini.jeong@...sung.com, myungjoo.ham@...sung.com
Subject: Re: [PATCH 0/9] Samsung Trinity NPU device driver
On Mon, Jul 25, 2022 at 12:02 PM Greg KH <gregkh@...uxfoundation.org> wrote:
>
> On Mon, Jul 25, 2022 at 03:52:59PM +0900, Jiho Chu wrote:
> > Hello,
> >
> > My name is Jiho Chu, and working for device driver and system daemon for
> > several years at Samsung Electronics.
> >
> > Trinity Neural Processing Unit (NPU) series are hardware accelerators
> > for neural network processing in embedded systems, which are integrated
> > into application processors or SoCs. Trinity NPU is compatible with AMBA
> > bus architecture and first launched in 2018 with its first version for
> > vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2,
> > is released in Dec, 2021. Another Trinity NPU for audio processing is
> > referred as TRIA.
> >
> > TRIV2 is shipped for many models of 2022 Samsung TVs, providing
> > acceleration for various AI-based applications, which include image
> > recognition and picture quality improvements for streaming video, which
> > can be accessed via GStreamer and its neural network plugins,
> > NNStreamer.
> >
> > In this patch set, it includes Trinity Vision 2 kernel device driver.
> > Trinity Vision 2 supports accelerating image inference process for
> > Convolution Neural Network (CNN). The CNN workload is executed by Deep
> > Learning Accelerator (DLA), and general Neural Network Layers are
> > executed by Digital Signal Processor (DSP). And there is a Control
> > Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP,
> > CP) are composing Trinity Vision 2 NPU, and the device driver mainly
> > supervise the CP to manage entire NPU.
> >
> > Controlling DLA and DSP operations is performed with internal command
> > instructions. and the instructions for the Trinity is similar with
> > general processor's ISA, but it is specialized for Neural Processing
> > operations. The virtual ISA (vISA) is designed for calculating multiple
> > data with single operation, like modern SIMD processor. The device
> > driver loads a program to CP at start up, and the program can decode a
> > binary which is built with the vISA. We calls this decoding program as a
> > Instruction Decoding Unit (IDU) program. While running the NPU, the CP
> > executes IDU program to fetch and decode instructions which made up of
> > vISA, by the scheduling policy of the device driver.
> >
> > These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the
> > Trinity can easily communicate with most ARM processors. Each IPs
> > designed to have memory-mapped registers which can be used to control
> > the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe
> > interrupt signals from the DLA and DSP. Also, embedded Direct Memory
> > Access Controller (DMAC) manages data communications between internal
> > SRAM and outer main memory, IOMMU module supports unified memory space.
> >
> > A user can control the Trinity NPU with IOCTLs provided by driver. These
> > controls includes memory management operations to transfer model data
> > (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit
> > workload (RUN/STOP), and statistics operations to check current NPU
> > status. (STAT)
> >
> > The device driver also implemented features for developers. It provides
> > sysfs control attributes like stop, suspend, sched_test, and profile.
> > Also, it provides status attributes like app status, a number of total
> > requests, a number of active requests and memory usages. For the tracing
> > operations, several ftrace events are defined and embedded for several
> > important points.
>
> If you have created sysfs files, you need to document them in
> Documentation/ABI/ which I do not see in your diffstat. Perhaps add
> that for your next respin?
>
> Also, please remove the "tracing" logic you have in the code, use
> ftrace, don't abuse dev_info() everywhere, that's not needed at all.
>
> thanks,
>
> greg k-h
Hi,
Why isn't this submitted to soc/ subsystem ?
Don't you think that would be more appropriate, given that this IP is
integrated into application processors ?
Thanks,
Oded
Powered by blists - more mailing lists