[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3255edfa-4465-204b-4751-8d40c8fb1382@arm.com>
Date: Tue, 23 Jul 2019 14:19:55 +0100
From: Robin Murphy <robin.murphy@....com>
To: Jon Hunter <jonathanh@...dia.com>,
Jose Abreu <Jose.Abreu@...opsys.com>,
Lars Persson <lists@...h.nu>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>
Cc: Joao Pinto <Joao.Pinto@...opsys.com>,
Alexandre Torgue <alexandre.torgue@...com>,
Maxime Ripard <maxime.ripard@...tlin.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-stm32@...md-mailman.stormreply.com"
<linux-stm32@...md-mailman.stormreply.com>,
Chen-Yu Tsai <wens@...e.org>,
Maxime Coquelin <mcoquelin.stm32@...il.com>,
linux-tegra <linux-tegra@...r.kernel.org>,
Giuseppe Cavallaro <peppe.cavallaro@...com>,
"David S . Miller" <davem@...emloft.net>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page
Pool
On 23/07/2019 13:09, Jon Hunter wrote:
>
> On 23/07/2019 11:29, Robin Murphy wrote:
>> On 23/07/2019 11:07, Jose Abreu wrote:
>>> From: Jon Hunter <jonathanh@...dia.com>
>>> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
>>>
>>>> This appears to be a winner and by disabling the SMMU for the ethernet
>>>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>>>> this worked! So yes appears to be related to the SMMU being enabled. We
>>>> had to enable the SMMU for ethernet recently due to commit
>>>> 954a03be033c7cef80ddc232e7cbdb17df735663.
>>>
>>> Finally :)
>>>
>>> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
>>>
>>> + There are few reasons to allow unmatched stream bypass, and
>>> + even fewer good ones. If saying YES here breaks your board
>>> + you should work on fixing your board.
>>>
>>> So, how can we fix this ? Is your ethernet DT node marked as
>>> "dma-coherent;" ?
>>
>> The first thing to try would be booting the failing setup with
>> "iommu.passthrough=1" (or using CONFIG_IOMMU_DEFAULT_PASSTHROUGH) - if
>> that makes things seem OK, then the problem is likely related to address
>> translation; if not, then it's probably time to start looking at nasties
>> like coherency and ordering, although in principle I wouldn't expect the
>> SMMU to have too much impact there.
>
> Setting "iommu.passthrough=1" works for me. However, I am not sure where
> to go from here, so any ideas you have would be great.
OK, so that really implies it's something to do with the addresses. From
a quick skim of the patch, I'm wondering if it's possible for buf->addr
and buf->page->dma_addr to get out-of-sync at any point. The nature of
the IOVA allocator makes it quite likely that a stale DMA address will
have been reused for a new mapping, so putting the wrong address in a
descriptor may well mean the DMA still ends up hitting a valid
translation, but which is now pointing to a different page.
>> Do you know if the SMMU interrupts are working correctly? If not, it's
>> possible that an incorrect address or mapping direction could lead to
>> the DMA transaction just being silently terminated without any fault
>> indication, which generally presents as inexplicable weirdness (I've
>> certainly seen that on another platform with the mix of an unsupported
>> interrupt controller and an 'imperfect' ethernet driver).
>
> If I simply remove the iommu node for the ethernet controller, then I
> see lots of ...
>
> [ 6.296121] arm-smmu 12000000.iommu: Unexpected global fault, this could be serious
> [ 6.296125] arm-smmu 12000000.iommu: GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000014, GFSYNR2 0x00000000
>
> So I assume that this is triggering the SMMU interrupt correctly.
According to tegra186.dtsi it appears you're using the MMU-500 combined
interrupt, so if global faults are being delivered then context faults
*should* also, but I'd be inclined to try a quick hack of the relevant
stmmac_desc_ops::set_addr callback to write some bogus unmapped address
just to make sure arm_smmu_context_fault() then screams as expected, and
we're not missing anything else.
Robin.
Powered by blists - more mailing lists