[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab14f31f-2045-b1be-d31f-2a81b8527dac@nvidia.com>
Date: Tue, 23 Jul 2019 11:01:24 +0100
From: Jon Hunter <jonathanh@...dia.com>
To: Jose Abreu <Jose.Abreu@...opsys.com>, Lars Persson <lists@...h.nu>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-stm32@...md-mailman.stormreply.com"
<linux-stm32@...md-mailman.stormreply.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Joao Pinto <Joao.Pinto@...opsys.com>,
"David S . Miller" <davem@...emloft.net>,
Giuseppe Cavallaro <peppe.cavallaro@...com>,
Alexandre Torgue <alexandre.torgue@...com>,
Maxime Coquelin <mcoquelin.stm32@...il.com>,
Maxime Ripard <maxime.ripard@...tlin.com>,
Chen-Yu Tsai <wens@...e.org>,
linux-tegra <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page
Pool
On 23/07/2019 09:14, Jose Abreu wrote:
> From: Jose Abreu <joabreu@...opsys.com>
> Date: Jul/22/2019, 15:04:49 (UTC+00:00)
>
>> From: Jon Hunter <jonathanh@...dia.com>
>> Date: Jul/22/2019, 13:05:38 (UTC+00:00)
>>
>>>
>>> On 22/07/2019 12:39, Jose Abreu wrote:
>>>> From: Lars Persson <lists@...h.nu>
>>>> Date: Jul/22/2019, 12:11:50 (UTC+00:00)
>>>>
>>>>> On Mon, Jul 22, 2019 at 12:18 PM Ilias Apalodimas
>>>>> <ilias.apalodimas@...aro.org> wrote:
>>>>>>
>>>>>> On Thu, Jul 18, 2019 at 07:48:04AM +0000, Jose Abreu wrote:
>>>>>>> From: Jon Hunter <jonathanh@...dia.com>
>>>>>>> Date: Jul/17/2019, 19:58:53 (UTC+00:00)
>>>>>>>
>>>>>>>> Let me know if you have any thoughts.
>>>>>>>
>>>>>>> Can you try attached patch ?
>>>>>>>
>>>>>>
>>>>>> The log says someone calls panic() right?
>>>>>> Can we trye and figure were that happens during the stmmac init phase?
>>>>>>
>>>>>
>>>>> The reason for the panic is hidden in this one line of the kernel logs:
>>>>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>>>>
>>>>> The init process is killed by SIGSEGV (signal 11 = 0xb).
>>>>>
>>>>> I would suggest you look for data corruption bugs in the RX path. If
>>>>> the code is fetched from the NFS mount then a corrupt RX buffer can
>>>>> trigger a crash in userspace.
>>>>>
>>>>> /Lars
>>>>
>>>>
>>>> Jon, I'm not familiar with ARM. Are the buffer addresses being allocated
>>>> in a coherent region ? Can you try attached patch which adds full memory
>>>> barrier before the sync ?
>>>
>>> TBH I am not sure about the buffer addresses either. The attached patch
>>> did not help. Same problem persists.
>>
>> OK. I'm just guessing now at this stage but can you disable SMP ?
I tried limiting the number of CPUs to one by setting 'maxcpus=0' on the
kernel command line. However, this did not help.
>> We have to narrow down if this is coherency issue but you said that
>> booting without NFS and then mounting manually the share works ... So,
>> can you share logs with same debug prints in this condition in order to
>> compare ?
>
> Jon, I have one ARM based board and I can't face your issue but I
> noticed that my buffer addresses are being mapped using SWIOTLB. Can you
> disable IOMMU support on your setup and let me know if the problem
> persists ?
This appears to be a winner and by disabling the SMMU for the ethernet
controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
this worked! So yes appears to be related to the SMMU being enabled. We
had to enable the SMMU for ethernet recently due to commit
954a03be033c7cef80ddc232e7cbdb17df735663.
Cheers
Jon
--
nvpublic
Powered by blists - more mailing lists