lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6iaf7di.wl-tiwai@suse.de>
Date: Thu, 15 Feb 2024 09:40:09 +0100
From: Takashi Iwai <tiwai@...e.de>
To: Hillf Danton <hdanton@...a.com>
Cc: Karthikeyan Ramasubramanian <kramasub@...omium.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Takashi Iwai <tiwai@...e.de>,
	Sven van Ashbrook <svenva@...omium.org>,
	Brian Geffon <bgeffon@...gle.com>,
	linux-sound@...r.kernel.org
Subject: Re: [PATCH v1] ALSA: memalloc: Fix indefinite hang in non-iommu case

On Thu, 15 Feb 2024 04:45:27 +0100,
Hillf Danton wrote:
> 
> On Wed, 14 Feb 2024 17:07:25 -0700 Karthikeyan Ramasubramanian <kramasub@...omium.org>
> > Before 9d8e536 ("ALSA: memalloc: Try dma_alloc_noncontiguous() at first")
> > the alsa non-contiguous allocator always called the alsa fallback
> > allocator in the non-iommu case. This allocated non-contig memory
> > consisting of progressively smaller contiguous chunks. Allocation was
> > fast due to the OR-ing in of __GFP_NORETRY.
> > 
> > After 9d8e536 ("ALSA: memalloc: Try dma_alloc_noncontiguous() at first")
> > the code tries the dma non-contig allocator first, then falls back to
> > the alsa fallback allocator. In the non-iommu case, the former supports
> > only a single contiguous chunk.
> > 
> > We have observed experimentally that under heavy memory fragmentation,
> > allocating a large-ish contiguous chunk with __GFP_RETRY_MAYFAIL
> > triggers an indefinite hang in the dma non-contig allocator. This has
> > high-impact, as an occurrence will trigger a device reboot, resulting in
> > loss of user state.
> > 
> > Fix the non-iommu path by letting dma_alloc_noncontiguous() fail quickly
> > so it does not get stuck looking for that elusive large contiguous chunk,
> > in which case we will fall back to the alsa fallback allocator.
> 
> The faster dma_alloc_noncontiguous() fails the more likely the paperover
> in 9d8e536d36e7 fails to work, so this is another case of bandaid instead
> of mitigating heavy fragmentation at the first place.

Yes, the main problem is the indefinite hang from
dma_alloc_noncontiguous().

So, is the behavior more or less same even if you pass
__GFP_RETRY_MAYFAIL to dma_alloc_noncontiguous()?  Or is this flag
already implicitly set somewhere in the middle?  It shouldn't hang
indefinitely, but the other impact to the system like OOM-killer
kickoff may be seen.

As of now, I'm inclined to take the suggested workaround.  It'll work
in most cases.  The original issue worked around by the commit
9d8e536d36e7 still remains, and we need to address differently.


thanks,

Takashi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ