lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yt5cFBgiTLwGXv17@kroah.com>
Date:   Mon, 25 Jul 2022 11:02:12 +0200
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Jiho Chu <jiho.chu@...sung.com>
Cc:     arnd@...db.de, linux-kernel@...r.kernel.org,
        yelini.jeong@...sung.com, myungjoo.ham@...sung.com
Subject: Re: [PATCH 0/9] Samsung Trinity NPU device driver

On Mon, Jul 25, 2022 at 03:52:59PM +0900, Jiho Chu wrote:
> Hello,
> 
> My name is Jiho Chu, and working for device driver and system daemon for
> several years at Samsung Electronics. 
> 
> Trinity Neural Processing Unit (NPU) series are hardware accelerators
> for neural network processing in embedded systems, which are integrated
> into application processors or SoCs. Trinity NPU is compatible with AMBA
> bus architecture and first launched in 2018 with its first version for
> vision processing, Trinity Version1 (TRIV1). Its second version, TRIV2,
> is released in Dec, 2021. Another Trinity NPU for audio processing is
> referred as TRIA.
> 
> TRIV2 is shipped for many models of 2022 Samsung TVs, providing
> acceleration for various AI-based applications, which include image
> recognition and picture quality improvements for streaming video, which
> can be accessed via GStreamer and its neural network plugins,
> NNStreamer.
> 
> In this patch set, it includes Trinity Vision 2 kernel device driver.
> Trinity Vision 2 supports accelerating image inference process for
> Convolution Neural Network (CNN). The CNN workload is executed by Deep
> Learning Accelerator (DLA), and general Neural Network Layers are
> executed by Digital Signal Processor (DSP). And there is a Control
> Processor (CP) which can control DLA and DSP. These three IPs (DLA, DSP,
> CP) are composing Trinity Vision 2 NPU, and the device driver mainly
> supervise the CP to manage entire NPU.
> 
> Controlling DLA and DSP operations is performed with internal command
> instructions. and the instructions for the Trinity is similar with
> general processor's ISA, but it is specialized for Neural Processing
> operations. The virtual ISA (vISA) is designed for calculating multiple
> data with single operation, like modern SIMD processor. The device
> driver loads a program to CP at start up, and the program can decode a
> binary which is built with the vISA. We calls this decoding program as a
> Instruction Decoding Unit (IDU) program. While running the NPU, the CP
> executes IDU program to fetch and decode instructions which made up of
> vISA, by the scheduling policy of the device driver.
> 
> These DLA, DSP and CP are loosely coupled using ARM's AMBA, so the
> Trinity can easily communicate with most ARM processors. Each IPs
> designed to have memory-mapped registers which can be used to control
> the IP, and the CP provides Wait-For-Event (WFE) operation to subscribe
> interrupt signals from the DLA and DSP. Also, embedded Direct Memory
> Access Controller (DMAC) manages data communications between internal
> SRAM and outer main memory, IOMMU module supports unified memory space.
> 
> A user can control the Trinity NPU with IOCTLs provided by driver. These
> controls includes memory management operations to transfer model data
> (HWMEM_ALLOC/HWMEM_DEALLOC), NPU workload control operations to submit
> workload (RUN/STOP), and statistics operations to check current NPU
> status. (STAT)
> 
> The device driver also implemented features for developers. It provides
> sysfs control attributes like stop, suspend, sched_test, and profile.
> Also, it provides status attributes like app status, a number of total
> requests, a number of active requests and memory usages. For the tracing
> operations, several ftrace events are defined and embedded for several
> important points.

If you have created sysfs files, you need to document them in
Documentation/ABI/ which I do not see in your diffstat.  Perhaps add
that for your next respin?

Also, please remove the "tracing" logic you have in the code, use
ftrace, don't abuse dev_info() everywhere, that's not needed at all.

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ