lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Aug 2014 12:11:21 +0200
From:	Thomas Hellstrom <>
To:	Mario Kleiner <>,
	Konrad Rzeszutek Wilk <>,
	Dave Airlie <>,
	Jerome Glisse <>
CC:	<>, <>,
	LKML <>,
	"" <>,
Subject: Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

On 08/10/2014 08:02 PM, Mario Kleiner wrote:
> On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
>> On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>>> Resent this time without HTML formatting which lkml doesn't like.
>>> Sorry.
>>> On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:
>>>> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:
>>>>> On August 9, 2014 1:39:39 AM EDT, Thomas
>>>>> Hellstrom<>  wrote:
>>>>>> Hi.
>>>>> Hey Thomas!
>>>>>> IIRC I don't think the TTM DMA pool allocates coherent pages more
>>>>>> than
>>>>>> one page at a time, and _if that's true_ it's pretty unnecessary for
>>>>>> the
>>>>>> dma subsystem to route those allocations to CMA. Maybe Konrad could
>>>>>> shed
>>>>>> some light over this?
>>>>> It should allocate in batches and keep them in the TTM DMA pool for
>>>>> some time to be reused.
>>>>> The pages that it gets are in 4kb granularity though.
>>>> Then I feel inclined to say this is a DMA subsystem bug. Single page
>>>> allocations shouldn't get routed to CMA.
>>>> /Thomas
>>> Yes, seems you're both right. I read through the code a bit more and
>>> indeed the TTM DMA pool allocates only one page during each
>>> dma_alloc_coherent() call, so it doesn't need CMA memory. The current
>>> allocators don't check for single page CMA allocations and therefore
>>> try to get it from the CMA area anyway, instead of skipping to the
>>> much cheaper fallback.
>>> So the callers of dma_alloc_from_contiguous() could need that little
>>> optimization of skipping it if only one page is requested. For
>>> dma_generic_alloc_coherent
>>> <>
>>> andintel_alloc_coherent
>>> <> 
>>> this
>>> seems easy to do. Looking at the arm arch variants, e.g.,
>>> and
>>> i'm not sure if it is that easily done, as there aren't any fallbacks
>>> for such a case and the code looks to me as if that's at least
>>> somewhat intentional.
>>> As far as TTM goes, one quick one-line fix to prevent it from using
>>> the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
>>> above methods) would be to clear the __GFP_WAIT
>>> <>
>>> flag from the
>>> passed gfp_t flags. That would trigger the well working fallback.
>>> So, is
>>> __GFP_WAIT 
>>> <> 
>>> needed
>>> for those single page allocations that go through__ttm_dma_alloc_page
>>> <>?
>>> It would be nice to have such a simple, non-intrusive one-line patch
>>> that we still could get into 3.17 and then backported to older stable
>>> kernels to avoid the same desktop hangs there if CMA is enabled. It
>>> would be also nice for actual users of CMA to not use up lots of CMA
>>> space for gpu's which don't need it. I think DMA_CMA was introduced
>>> around 3.12.
>> I don't think that's a good idea. Omitting __GFP_WAIT would cause
>> unnecessary memory allocation errors on systems under stress.
>> I think this should be filed as a DMA subsystem kernel bug / regression
>> and an appropriate solution should be worked out together with the DMA
>> subsystem maintainers and then backported.
> Ok, so it is needed. I'll file a bug report.
>>> The other problem is that probably TTM does not reuse pages from the
>>> DMA pool. If i trace the __ttm_dma_alloc_page
>>> <>
>>> and
>>> __ttm_dma_free_page
>>> <>
>>> calls for
>>> those single page allocs/frees, then over a 20 second interval of
>>> tracing and switching tabs in firefox, scrolling things around etc. i
>>> find about as many alloc's as i find free's, e.g., 1607 allocs vs.
>>> 1648 frees.
>> This is because historically the pools have been designed to keep only
>> pages with nonstandard caching attributes since changing page caching
>> attributes have been very slow but the kernel page allocators have been
>> reasonably fast.
>> /Thomas
> Ok. A bit more ftraceing showed my hang problem case goes through the
> "if (is_cached)" paths, so the pool doesn't recycle anything and i see
> it bouncing up and down by 4 pages all the time.
> But for the non-cached case, which i don't hit with my problem, could
> one of you look at line 954...
> ... and tell me why that unconditional npages = count; assignment
> makes sense? It seems to essentially disable all recycling for the dma
> pool whenever the pool isn't filled up to/beyond its maximum with free
> pages? When the pool is filled up, lots of stuff is recycled, but when
> it is already somewhat below capacity, it gets "punished" by not
> getting refilled? I'd just like to understand the logic behind that line.
> thanks,
> -mario

I'll happily forward that question to Konrad who wrote the code (or it
may even stem from the ordinary page pool code which IIRC has Dave
Airlie / Jerome Glisse as authors)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists