[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9df969a8-08b1-2b5a-3a86-9a1918f1949b@arm.com>
Date: Tue, 11 Oct 2022 13:15:57 +0100
From: Robin Murphy <robin.murphy@....com>
To: Jerry Snitselaar <jsnitsel@...hat.com>,
Christoph Hellwig <hch@....de>
Cc: Joerg Roedel <joro@...tes.org>, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: Issue seen since commit f5ff79fddf0e ("dma-mapping: remove
CONFIG_DMA_REMAP")
On 2022-10-10 19:57, Jerry Snitselaar wrote:
> I've been looking at an odd issue that shows up with commit
> f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP"). What is being
> seen is the bnx2fc driver calling dma_free_coherent(), and eventually
> hits the BUG_ON() in vunmap(). bnx2fc_free_session_resc() does a
> spin_lock_bh() around the dma_free_coherent() calls, and looking at
> preempt.h that will trigger in_interrupt() to return positive, so that
> makes sense. The really odd part is this only happens with the
> shutdown of the kernel after a system install. Reboots after that do not
> hit the BUG_ON() in vunmap().
Most likely a difference in IOMMU config/parameters between the
installer and the installed kernel - if the latter is defaulting to
passthrough then it won't be remapping (assuming the device is coherent).
> I still need to grab a system and try to see what it is doing on the
> subsequent shutdowns, because it seems to me that any time
> bnx2fc_free_session_resc() is called it will end up there, unless the
> allocs are not coming from vmalloc() in the later boots. Between the
> comments in dma_free_attrs(), and preempt.h, dma_free_coherent()
> shouldn't be called under a spin_lock_bh(), yes? I think the comments
> in dma_free_attrs() might be out of date with commit f5ff79fddf0e
> ("dma-mapping: remove CONFIG_DMA_REMAP") in place since now it is more
> general that you can land in vunmap(). Also, should that WARN_ON() in
> dma_free_attrs() trigger as well for the BH disabled case?
>
> It was also reproduced with a 6.0-rc5 kernel build[1].
Looking at the history of that comment I guess I was just trying to
capture the most common case to explain the original motivation for
having the WARN_ON(). It was never meant to imply that that's the *only*
reason, especially since iommu-dma was already well established by that
point. That warning has been present on x86 in one form or another for
15 years, so I guess the real issue at hand is the difference between
irqs_disabled() and in_interrupt()?
As far as that particular driver goes it looks rather questionable
anyway - it seems like a terrible design flaw if a race between
consuming things and freeing them can exist at all, but then it looks
like bnx2fc_arm_cq() might actually program the hardware to *reuse* a CQ
which may already be waiting to be freed as soon as the lock is
dropped... that can't be good :/
Thanks,
Robin.
Powered by blists - more mailing lists