lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 20 Feb 2014 15:23:03 +0530
From:	Jassi Brar <jaswinder.singh@...aro.org>
To:	Srikanth Thokala <sthokal@...inx.com>
Cc:	"Williams, Dan J" <dan.j.williams@...el.com>,
	"Koul, Vinod" <vinod.koul@...el.com>,
	Michal Simek <michal.simek@...inx.com>,
	Grant Likely <grant.likely@...aro.org>,
	Rob Herring <robh+dt@...nel.org>, devicetree@...r.kernel.org,
	Levente Kurusa <levex@...ux.com>,
	Lars-Peter Clausen <lars@...afoo.de>,
	lkml <linux-kernel@...r.kernel.org>, dmaengine@...r.kernel.org,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 1/3] dma: Support multiple interleaved frames with
 non-contiguous memory

On 20 February 2014 14:54, Srikanth Thokala <sthokal@...inx.com> wrote:
> On Wed, Feb 19, 2014 at 12:33 AM, Jassi Brar <jaswinder.singh@...aro.org> wrote:
>> On 18 February 2014 23:16, Srikanth Thokala <sthokal@...inx.com> wrote:
>>> On Tue, Feb 18, 2014 at 10:20 PM, Jassi Brar <jaswinder.singh@...aro.org> wrote:
>>>> On 18 February 2014 16:58, Srikanth Thokala <sthokal@...inx.com> wrote:
>>>>> On Mon, Feb 17, 2014 at 3:27 PM, Jassi Brar <jaswinder.singh@...aro.org> wrote:
>>>>>> On 15 February 2014 17:30, Srikanth Thokala <sthokal@...inx.com> wrote:
>>>>>>> The current implementation of interleaved DMA API support multiple
>>>>>>> frames only when the memory is contiguous by incrementing src_start/
>>>>>>> dst_start members of interleaved template.
>>>>>>>
>>>>>>> But, when the memory is non-contiguous it will restrict slave device
>>>>>>> to not submit multiple frames in a batch.  This patch handles this
>>>>>>> issue by allowing the slave device to send array of interleaved dma
>>>>>>> templates each having a different memory location.
>>>>>>>
>>>>>> How fragmented could be memory in your case? Is it inefficient to
>>>>>> submit separate transfers for each segment/frame?
>>>>>> It will help if you could give a typical example (chunk size and gap
>>>>>> in bytes) of what you worry about.
>>>>>
>>>>> With scatter-gather engine feature in the hardware, submitting separate
>>>>> transfers for each frame look inefficient. As an example, our DMA engine
>>>>> supports up to 16 video frames, with each frame (a typical video frame
>>>>> size) being contiguous in memory but frames are scattered into different
>>>>> locations. We could not definitely submit frame by frame as it would be
>>>>> software overhead (HW interrupting for each frame) resulting in video lags.
>>>>>
>>>> IIUIC, it is 30fps and one dma interrupt per frame ... it doesn't seem
>>>> inefficient at all. Even poor-latency audio would generate a higher
>>>> interrupt-rate. So the "inefficiency concern" doesn't seem valid to
>>>> me.
>>>>
>>>> Not to mean we shouldn't strive to reduce the interrupt-rate further.
>>>> Another option is to emulate the ring-buffer scheme of ALSA.... which
>>>> should be possible since for a session of video playback the frame
>>>> buffers' locations wouldn't change.
>>>>
>>>> Yet another option is to use the full potential of the
>>>> interleaved-xfer api as such. It seems you confuse a 'video frame'
>>>> with the interleaved-xfer api's 'frame'. They are different.
>>>>
>>>> Assuming your one video frame is F bytes long and Gk is the gap in
>>>> bytes between end of frame [k] and start of frame [k+1] and  Gi != Gj
>>>> for i!=j
>>>> In the context of interleaved-xfer api, you have just 1 Frame of 16
>>>> chunks. Each chunk is Fbytes and the inter-chunk-gap(ICG) is Gk  where
>>>> 0<=k<15
>>>> So for your use-case .....
>>>>   dma_interleaved_template.numf = 1   /* just 1 frame */
>>>>   dma_interleaved_template.frame_size = 16  /* containing 16 chunks */
>>>>    ...... //other parameters
>>>>
>>>> You have 3 options to choose from and all should work just as fine.
>>>> Otherwise please state your problem in real numbers (video-frames'
>>>> size, count & gap in bytes).
>>>
>>> Initially I interpreted interleaved template the same.  But, Lars corrected me
>>> in the subsequent discussion and let me put it here briefly,
>>>
>>> In the interleaved template, each frame represents a line of size denoted by
>>> chunk.size and the stride by icg.  'numf' represent number of frames i.e.
>>> number of lines.
>>>
>>> In video frame context,
>>> chunk.size -> hsize
>>> chunk.icg -> stride
>>> numf -> vsize
>>> and frame_size is always 1 as it will have only one chunk in a line.
>>>
>> But you said in your last post
>>   "with each frame (a typical video frame size) being contiguous in memory"
>>  ... which is not true from what you write above. Anyways, my first 2
>> suggestions still hold.
>
> Yes, each video frame is contiguous and they can be scattered.
>
I assume by contiguous frame you mean as in framebuffer?  Which is an
array of bytes.
If yes, then you should do as I suggest first,  frame_size=16  and numf=1.

If no, then it seems you are already doing the right thing.... the
ring-buffer scheme. Please share some stats how the current api is
causing you overhead because that is a very common case (many
controllers support LLI) and you have 467ms (@30fps with 16-frames
ring-buffer) to queue in before you see any frame drop.

Regards,
Jassi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ