linux-kernel - Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150629113014.GC7557@n2100.arm.linux.org.uk>
Date:	Mon, 29 Jun 2015 12:30:15 +0100
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Catalin Marinas <catalin.marinas@....com>
Cc:	Arnd Bergmann <arnd@...db.de>, Lorenzo Nava <lorenx4@...il.com>,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Mon, Jun 29, 2015 at 12:05:33PM +0100, Catalin Marinas wrote:
> On Mon, Jun 29, 2015 at 11:46:04AM +0100, Russell King - ARM Linux wrote:
> > The reason we always clear the buffer is that we can't be sure that a
> > driver will not map a buffer allocated by dma_alloc_coherent() into
> > userspace without it first being initialised.  There have been drivers
> > which do this (ALSA in particular.)  I haven't checked whether this
> > instance still does this, but it used to - and the problem is once
> > one instance exists, it gets copied...
> 
> You are right, the memset'ing is probably still necessary to patch
> potential security holes. The cache flushing is not for coherent buffers
> (sometimes this may be more expensive than the memset itself, though
> lost in the noise if only done once).

Well, CMA itself is _not_ a fast allocator.  The code is highly inefficient
- multiple loops within loops within loops.

cma_alloc() walks over the range by the alignment size, trying to allocate
each block in sequence.

The worst case is: CMA region size nMB, you're trying to allocate with a
4k alignment, CMA memory is all pinned.  It will attempt to allocate the
first 4k page, calling alloc_contig_range() on the range.  That tries to
reclaim the pages, and while it may be able to reclaim some, it will try
five times to do so before failing.  We'll then increment the starting
address by 4k and repeat, until we get to the end of CMA.  So, the outer
loop tries nMB/4kB times to do the allocation - for a 256MB CMA region,
that's 64k calls to alloc_contig_range(), and 327680 calls to the page
reclaim functions.

I've recently seen a single allocation call for 8MB (1080p frame) from
256MB of CMA with scattered pinned pages taking about two seconds to
complete.

One of the problems is that CMA has no knowledge of other users of the
pages, and has no feedback from alloc_contig_range() about which pages
are pinned, so cma_alloc() can't make a sensible guess at the next block
to try.

zeroing and cache flushing is the _least_ of the problems in that code
path.

(Pages can be pinned by ext4fs - see the LWN articles on the subject of
pinned pages and CMA - and also by drivers that hold on to GFP_MOVABLE
pages.)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/