lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150629113014.GC7557@n2100.arm.linux.org.uk>
Date:	Mon, 29 Jun 2015 12:30:15 +0100
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Catalin Marinas <catalin.marinas@....com>
Cc:	Arnd Bergmann <arnd@...db.de>, Lorenzo Nava <lorenx4@...il.com>,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Mon, Jun 29, 2015 at 12:05:33PM +0100, Catalin Marinas wrote:
> On Mon, Jun 29, 2015 at 11:46:04AM +0100, Russell King - ARM Linux wrote:
> > The reason we always clear the buffer is that we can't be sure that a
> > driver will not map a buffer allocated by dma_alloc_coherent() into
> > userspace without it first being initialised.  There have been drivers
> > which do this (ALSA in particular.)  I haven't checked whether this
> > instance still does this, but it used to - and the problem is once
> > one instance exists, it gets copied...
> 
> You are right, the memset'ing is probably still necessary to patch
> potential security holes. The cache flushing is not for coherent buffers
> (sometimes this may be more expensive than the memset itself, though
> lost in the noise if only done once).

Well, CMA itself is _not_ a fast allocator.  The code is highly inefficient
- multiple loops within loops within loops.

cma_alloc() walks over the range by the alignment size, trying to allocate
each block in sequence.

The worst case is: CMA region size nMB, you're trying to allocate with a
4k alignment, CMA memory is all pinned.  It will attempt to allocate the
first 4k page, calling alloc_contig_range() on the range.  That tries to
reclaim the pages, and while it may be able to reclaim some, it will try
five times to do so before failing.  We'll then increment the starting
address by 4k and repeat, until we get to the end of CMA.  So, the outer
loop tries nMB/4kB times to do the allocation - for a 256MB CMA region,
that's 64k calls to alloc_contig_range(), and 327680 calls to the page
reclaim functions.

I've recently seen a single allocation call for 8MB (1080p frame) from
256MB of CMA with scattered pinned pages taking about two seconds to
complete.

One of the problems is that CMA has no knowledge of other users of the
pages, and has no feedback from alloc_contig_range() about which pages
are pinned, so cma_alloc() can't make a sensible guess at the next block
to try.

zeroing and cache flushing is the _least_ of the problems in that code
path.

(Pages can be pinned by ext4fs - see the LWN articles on the subject of
pinned pages and CMA - and also by drivers that hold on to GFP_MOVABLE
pages.)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ