linux-kernel - RE: [PATCH 0/4] Add Toshiba Visconti DNN image processing accelerator driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <TYAPR01MB62014A1EEA60CA824850179692DF9@TYAPR01MB6201.jpnprd01.prod.outlook.com>
Date:   Wed, 1 Jun 2022 01:40:22 +0000
From:   <yuji2.ishikawa@...hiba.co.jp>
To:     <hverkuil@...all.nl>, <robh+dt@...nel.org>,
        <nobuhiro1.iwamatsu@...hiba.co.jp>, <sumit.semwal@...aro.org>,
        <christian.koenig@....com>
CC:     <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <linux-media@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <linaro-mm-sig@...ts.linaro.org>
Subject: RE: [PATCH 0/4] Add Toshiba Visconti DNN image processing accelerator
 driver

Hi Hans,

Thank you for your advice.
I prepared some description of DNN accelerator and its usage.

#### Handling memory blocks for Visconti5 accelerators

Visconti5 Image-Processing-Accelerators do not have fine grained IOMMU, as CPU have.
Therefore, memory region to be passed to the accelerators should be physically contiguous.
We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to allocate memory regions for sharing between CPU/IPAs.
Originally, in v4.19 based implementation, the ION allocator was used to allocate DMA-BUF instances.
For the latest implementation, DMA-BUF HEAPS is used.

Two structure types are used to represent memory region passed to drivers.
* struct drv_ipa_buffer_info
  * to describe whole DMA-BUF instance
* struct drv_ipa_addr
  * to describe a memory region in a DMA-BUF instance

for details, see usage sample of each IPA driver


#### Image Processing Accelerators overview

Visconti5 SoC has following image processing accererators

* AFFINE: 1 input image, 1 output image;                                             Affine transform, Homography transform, Polynomial lens distortion, LUT transform
* DNN:    N input feature vector, N output feature vector;                           Deep neural network operation
* PYRAMID 3 input image, 3 * N output image;                                         Resize grayscale/color image with N different parameters
* DSPIF:  M input image, N output image;                                             Various opeations on images
* HOX:    1 input image (multi ROI), 1 input dictionary1 likelihood/feature vector;  Extended Histogram of Oriented Gradient based pattern matching
* HAMAT:  2 input feature vectors: 1 output corrdinate vector;                       Hamming distance matching for stereo vision
* FLMAT:  3 input image, N input feature point, N output matched point;              Optical flow matching
* SMLDB:  1 input image, N input feature point, N output feature vector;             Accelerated-KAZE feature descriptor accelerator
* STMAT:  2 input image, 1 output disparity image;                                   Stereo disparity

see [0] Fig 7.2.1 for block diagram (of prototype chip)


#### DNN accelerator overview

DNN accelerator is a proprietary CNN/DCNN processing accelerator developed by Toshiba.
Visconti5 SoC has 2 instances of DNN acclerator hardware.
Users convert existing Caffe/ONNX models to Visconti compatible models with an offline tool.
A converted model "Configuration Binary" includes:
  * instruction sequence for given network
  * weight/bias information
  * DMA configuration from/to global memory (for input/output feature)

DNN acccelerator can handle either 1 plane or multiple ROIs at a single call.

see [0] Fig 7.2.2 for block diagram of DNN accelerator

CNN: Convolutional Neural Network
DCNN: Deep Convolutional Neural Network


#### Input / Output

Input image or feature: base type is either of FP16, FP32, INT8, UINT8, INT16
Output feature vector:  base type is either of FP16, FP32, INT8, UINT8, INT16

Input, Output, Weight, Bias can be placed on global memory and loaded/stored with DMA within DNN accelerator.
These data on global memory can be specified as either of:
  * single address to point single data block
  * list of address to point multiple data blocks (i.e. ROIs)

DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor" which includes addresses of input/output features and a configuration binary.


#### Descriptor Builder at userland

Following APIs are provided to build a descriptor instance at userland.

/* defined in drv_dnn_util.h */
int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num);
int32_t drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc, const void *configuration_binary,
                                          struct drv_ipa_addr configuration_binary_addr, struct drv_ipa_addr *src_list,
                                          struct drv_ipa_addr *dst_list, int32_t list_num, struct drv_ipa_addr temporary_addr,
                                          int32_t temporary_size);
int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor *desc);

struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h.
I think this header should be placed anywhere else to be collected on "make headers_install" action of kernel building.


#### Usage sample (without error handlers)

    #include <linux/dma-heap.h>
    #include "drv_ipa.h"
    #include "drv_dnn.h"
    #include "drv_dnn_util.h" 

    int allocate_buffer(int fd_heap, int size) 
    {
        struct dma_heap_allocation_data heap_data_in={0};
        int ret;

        heap_data_in.len = ROUNDUP_POW2(size);
        heap_data_in.fd_flags = O_RDWR | O_CLOEXEC;

        ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in);
        if (ret <0)
            return -1;
        else
            return heap_data_in.fd;
    }

    void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst, int fd_temp)
    {
        int32_t ret;
        struct drv_ipa_buffer_info bufinfo[4] = {
            {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
            {.fd=fd_src,  .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
            {.fd=fd_dst,  .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
            {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
        };
        struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0};
        struct drv_ipa_addr src_addr  = {.buffer_index=1, .offset=0};
        struct drv_ipa_addr dst_addr  = {.buffer_index=2, .offset=0};
        struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0};
        struct drv_dnn_descriptor desc;

        struct drv_ipa_addr src_list[] = {src_addr};
        struct drv_ipa_addr dst_list[] = {dst_addr};

        uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE, PROT_READ, MAP_SHARED, fd_conf, 0);

        drv_DNN_config_descript_init(&desc, bufinfo, 4);
        drv_DNN_config_exec_configuration(&desc, config, conf_addr, src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE);
        drv_DNN_config_descript_finalize(&desc);

        ioctl(fd_dnn, IOC_IPA_START, &desc);

        {
            struct pollfd fds[] = {.fd=fd_dnn, .events=POLL_IN, .revents=0};
            poll(fds, 1, 1000);
        }
    }

    void sample()
    {
        int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp;

        fd_dnn = open("/dev/dnn0", O_RDWR);
        fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR);
        fd_conf = allocate_buffer(fd_heap, DNN_CONF_BIN_ALLOC_SIZE);
        fd_src  = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE);
        fd_dst  = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE);
        fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE);

        /* fill in input image and configuration here */

        dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp);

        ...
    };


#### Reference

* [0] https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/en/company/technical-review/pdf/technical-review-18_e.pdf
  * Fig 7.2.1 shows the whole architecture of prototype chip
  * Fig 7.2.2 shows the architecture of DNN accelerator


Regards,
Yuji

> -----Original Message-----
> From: Hans Verkuil <hverkuil@...all.nl>
> Sent: Friday, May 20, 2022 7:03 PM
> To: ishikawa yuji(石川 悠司 ○ＲＤＣ□ＡＩＴＣ○ＥＡ開)
> <yuji2.ishikawa@...hiba.co.jp>; robh+dt@...nel.org; iwamatsu nobuhiro(岩松
> 信洋 □ＳＷＣ◯ＡＣＴ) <nobuhiro1.iwamatsu@...hiba.co.jp>;
> sumit.semwal@...aro.org; christian.koenig@....com
> Cc: linux-arm-kernel@...ts.infradead.org; linux-kernel@...r.kernel.org;
> linux-media@...r.kernel.org; dri-devel@...ts.freedesktop.org;
> linaro-mm-sig@...ts.linaro.org
> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
> accelerator driver
> 
> Hi Yuji,
> 
> On 5/20/22 11:48, yuji2.ishikawa@...hiba.co.jp wrote:
> > Hi Hans,
> >
> > Thank you for your comment.
> > I agree that this submission lacks documents sharing basic idea of the
> accelerators; what do they accept and what do they yield.
> > Where can I put a new document? Can I put it as a comment in a source? Can
> I add a file under Documentation/misc-devices directory?
> 
> Start with explaining it by replying to this mail. Without knowing anything about
> the hardware, it is difficult to say what the best place is. Usually it is either the
> public API header, or somewhere in Documentation.
> 
> The first step is to have a better understanding of the Visconti image hardware
> and to see what the best subsystem would be to support that hardware.
> 
> Regards,
> 
> 	Hans
> 
> >
> > Thanks,
> > Yuji Ishikawa
> >
> >> -----Original Message-----
> >> From: Hans Verkuil <hverkuil@...all.nl>
> >> Sent: Thursday, May 12, 2022 8:15 PM
> >> To: ishikawa yuji(石川 悠司 ○ＲＤＣ□ＡＩＴＣ○ＥＡ開)
> >> <yuji2.ishikawa@...hiba.co.jp>; Rob Herring <robh+dt@...nel.org>;
> >> iwamatsu nobuhiro(岩松 信洋 □ＳＷＣ◯ＡＣＴ)
> >> <nobuhiro1.iwamatsu@...hiba.co.jp>; Sumit Semwal
> >> <sumit.semwal@...aro.org>; Christian König
> <christian.koenig@....com>
> >> Cc: linux-arm-kernel@...ts.infradead.org;
> >> linux-kernel@...r.kernel.org; linux-media@...r.kernel.org;
> >> dri-devel@...ts.freedesktop.org; linaro-mm-sig@...ts.linaro.org
> >> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
> >> accelerator driver
> >>
> >> Hi Yuji,
> >>
> >> On 4/28/22 15:11, Yuji Ishikawa wrote:
> >>> This series is the DNN image processing accelerator driver for
> >>> Toshiba's ARM
> >> SoC, Visconti[0].
> >>> This provides DT binding documentation, device driver, MAINTAINER
> files.
> >>>
> >>> The second patch "soc: visconti: Add Toshiba Visconti image
> >>> processing
> >> accelerator common source"
> >>> and the fourth patch "MAINTAINERS: ..." are the same as the ones in
> >>> the
> >> preceding post for affine driver.
> >>
> >> There appears to be no documentation whatsoever, unless I am missing
> >> something.
> >>
> >> How is the uAPI supposed to be used? What does it do? What formats
> >> does it accept or produce?
> >>
> >> If this processes images, then (as Laurent mentioned) this is more
> >> suitable as a
> >> V4L2 mem2mem driver.
> >>
> >> See
> >> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dev-
> >> me
> >> m2mem.html
> >> and the many drivers in drivers/media that use it (git grep
> v4l2-mem2mem.h).
> >>
> >> But without any explanation whatsoever I have no idea what does or
> >> does not make sense.
> >>
> >> Regards,
> >>
> >> 	Hans
> >>
> >>>
> >>> Best regards,
> >>> Yuji
> >>>
> >>> [0]:
> >>>
> >>
> https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image
> >> -
> >>> recognition-processors-visconti.html
> >>>
> >>> Yuji Ishikawa (4):
> >>>   dt-bindings: soc: visconti: Add Toshiba Visconti DNN image processing
> >>>     accelerator bindings
> >>>   soc: visconti: Add Toshiba Visconti image processing accelerator
> >>>     common source
> >>>   soc: visconti: Add Toshiba Visconti DNN image processing accelerator
> >>>   MAINTAINERS: Add entries for Toshiba Visconti DNN image processing
> >>>     accelerator
> >>>
> >>>  .../soc/visconti/toshiba,visconti-dnn.yaml    |  54 ++
> >>>  MAINTAINERS                                   |   2 +
> >>>  drivers/soc/Kconfig                           |   1 +
> >>>  drivers/soc/Makefile                          |   1 +
> >>>  drivers/soc/visconti/Kconfig                  |   7 +
> >>>  drivers/soc/visconti/Makefile                 |   8 +
> >>>  drivers/soc/visconti/dnn/Makefile             |   6 +
> >>>  drivers/soc/visconti/dnn/dnn.c                | 533
> >> ++++++++++++++++++
> >>>  drivers/soc/visconti/dnn/hwd_dnn.c            | 183 ++++++
> >>>  drivers/soc/visconti/dnn/hwd_dnn.h            |  68 +++
> >>>  drivers/soc/visconti/dnn/hwd_dnn_reg.h        | 228 ++++++++
> >>>  drivers/soc/visconti/ipa_common.c             |  55 ++
> >>>  drivers/soc/visconti/ipa_common.h             |  18 +
> >>>  drivers/soc/visconti/uapi/dnn.h               |  77 +++
> >>>  drivers/soc/visconti/uapi/ipa.h               |  88 +++
> >>>  15 files changed, 1329 insertions(+)  create mode 100644
> >>> Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn.
> >>> ya ml  create mode 100644 drivers/soc/visconti/Kconfig  create mode
> >>> 100644 drivers/soc/visconti/Makefile  create mode 100644
> >>> drivers/soc/visconti/dnn/Makefile  create mode 100644
> >>> drivers/soc/visconti/dnn/dnn.c  create mode 100644
> >>> drivers/soc/visconti/dnn/hwd_dnn.c
> >>>  create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h
> >>>  create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h
> >>>  create mode 100644 drivers/soc/visconti/ipa_common.c  create mode
> >>> 100644 drivers/soc/visconti/ipa_common.h  create mode 100644
> >>> drivers/soc/visconti/uapi/dnn.h  create mode 100644
> >>> drivers/soc/visconti/uapi/ipa.h
> >>>