lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 06 Feb 2014 16:53:59 +0100
From:	Lars-Peter Clausen <lars@...afoo.de>
To:	Srikanth Thokala <sthokal@...inx.com>
CC:	Vinod Koul <vinod.koul@...el.com>, dan.j.williams@...el.com,
	michal.simek@...inx.com, Grant Likely <grant.likely@...aro.org>,
	robh+dt@...nel.org, linux-arm-kernel@...ts.infradead.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	devicetree@...r.kernel.org, dmaengine@...r.kernel.org
Subject: Re: [PATCH v2] dma: Add Xilinx AXI Video Direct Memory Access Engine
 driver support

On 02/06/2014 02:34 PM, Srikanth Thokala wrote:
> On Wed, Feb 5, 2014 at 10:00 PM, Lars-Peter Clausen <lars@...afoo.de> wrote:
>> On 02/05/2014 05:25 PM, Srikanth Thokala wrote:
>>>
>>> On Fri, Jan 31, 2014 at 12:21 PM, Srikanth Thokala <sthokal@...inx.com>
>>> wrote:
>>>>
>>>> Hi Vinod,
>>>>
>>>> On Tue, Jan 28, 2014 at 8:43 AM, Vinod Koul <vinod.koul@...el.com> wrote:
>>>>>
>>>>> On Mon, Jan 27, 2014 at 06:42:36PM +0530, Srikanth Thokala wrote:
>>>>>>
>>>>>> Hi Lars/Vinod,
>>>>>>>>
>>>>>>>> The question here i think would be waht this device supports? Is the
>>>>>>>> hardware
>>>>>>>> capable of doing interleaved transfers, then would make sense.
>>>>>>>
>>>>>>>
>>>>>>> The hardware does 2D transfers. The parameters for a transfer are
>>>>>>> height,
>>>>>>> width and stride. That's only a subset of what interleaved transfers
>>>>>>> can be
>>>>>>> (xt->num_frames must be one for 2d transfers). But if I remember
>>>>>>> correctly
>>>>>>> there has been some discussion on this in the past and the result of
>>>>>>> that
>>>>>>> discussion was that using interleaved transfers for 2D transfers is
>>>>>>> preferred over adding a custom API for 2D transfers.
>>>>>>
>>>>>>
>>>>>> I went through the prep_interleaved_dma API and I see only one
>>>>>> descriptor
>>>>>> is prepared per API call (i.e. per frame).  As our IP supports upto 16
>>>>>> frame
>>>>>> buffers (can be more in future), isn't it less efficient compared to
>>>>>> the
>>>>>> prep_slave_sg where we get a single sg list and can prepare all the
>>>>>> descriptors
>>>>>> (of non-contiguous buffers) in one go?  Correct me, if am wrong and let
>>>>>> me
>>>>>> know your opinions.
>>>>>
>>>>> Well the descriptor maybe one, but that can represent multiple frames,
>>>>> for
>>>>> example 16 as in your case. Can you read up the documentation of how
>>>>> multiple
>>>>> frames are passed. Pls see include/linux/dmaengine.h
>>>>>
>>>>> /**
>>>>>    * Interleaved Transfer Request
>>>>>    * ----------------------------
>>>>>    * A chunk is collection of contiguous bytes to be transfered.
>>>>>    * The gap(in bytes) between two chunks is called inter-chunk-gap(ICG).
>>>>>    * ICGs may or maynot change between chunks.
>>>>>    * A FRAME is the smallest series of contiguous {chunk,icg} pairs,
>>>>>    *  that when repeated an integral number of times, specifies the
>>>>> transfer.
>>>>>    * A transfer template is specification of a Frame, the number of times
>>>>>    *  it is to be repeated and other per-transfer attributes.
>>>>>    *
>>>>>    * Practically, a client driver would have ready a template for each
>>>>>    *  type of transfer it is going to need during its lifetime and
>>>>>    *  set only 'src_start' and 'dst_start' before submitting the
>>>>> requests.
>>>>>    *
>>>>>    *
>>>>>    *  |      Frame-1        |       Frame-2       | ~ |
>>>>> Frame-'numf'  |
>>>>>    *  |====....==.===...=...|====....==.===...=...| ~
>>>>> |====....==.===...=...|
>>>>>    *
>>>>>    *    ==  Chunk size
>>>>>    *    ... ICG
>>>>>    */
>>>>
>>>>
>>>> Yes, it can handle multiple frames specified by 'numf' each of size
>>>> 'frame_size * sgl[0].size'.
>>>> But, I see it only works if all the frames' memory is contiguous and
>>>> in this case we
>>>> can just increment 'src_start' by the total frame size 'numf' number
>>>> of times to fill in
>>>> for each HW descriptor (each frame is one HW descriptor).  So, there
>>>> is no issue when the
>>>> memory is contiguous.  If the frames are non contiguous, we have to
>>>> call this API for each
>>>> frame (hence for each descriptor), as the src_start for each frame is
>>>> different.  Is it correct?
>>>>
>>>> FYI: This hardware has an inbuilt Scatter-Gather engine.
>>>>
>>>
>>> Ping?
>>
>>
>> If you want to submit multiple frames at once I think you should look at how
>> the current dmaengine API can be extended to allow that. And also provide an
>> explanation on how this is superior over submitting them one by one.
>
>
> Sure.  I would start with explaning the current implementation of this driver.
>
> Using prep_slave_sg(), we can define multiple segments in a
> async_tx_descriptor where each frame is defined by a segment (a sg
> list entry).  So, the slave device could DMA the data (of multiple
> frames) with a descriptor by calling tx_submit in a transaction i.e.,
>
> prep_slave_sg(16)  -> tx_submit(1) -> interrupt  (16 frames)
>
> Using interleaved_dma(), we could not divide into segments when we
> have scattered memory (for the reasons mentioned in above thread).
> This implies we are restricting the slave device to process frame by
> frame i.e.,
>
> interleaved_dma(1) -> tx_submit(1) -> interrupt -> interleaved_dma(2)
> -> tx_submit (2) -> interrupt -> ........ tx_submit(16) -> interrupt
>

The API allows you to create and submit multiple interleaved descriptors 
before you have to issue them.

interleaved_dma(1) -> tx_submit(1) -> interleaved_dma(2) -> tx_submit(2) -> 
... -> issue_pending() -> interrupt

> This implementation makes the hardware to wait until the next frame is
> submitted.
>
> To overcome this, I feel it would be a good option if we could extend
> interleaved_dma template to modify src_start/dest_start to be a
> pointer to an array of addresses.  Here, number of addresses will be
> defined by numf. The other option would be to include scatterlist in
> the interleaved template. This way we can handle scattered memory
> using this API.

Each "frame" in a interleaved transfer describes a single line in your video 
frame (size = width, icg = stride). numf is the number of lines per video 
frame. So the suggested change does not make that much sense. If you want to 
submit multiple video frames in one batch the best option is in my opinion 
to allow to pass an array of dma_interleaved_template structs instead of a 
single one.

- Lars

>
> Srikanth
>
>>
>> - Lars
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists