lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e92b562-fa8f-0a2b-d8da-525ee52fc2d4@nvidia.com>
Date:   Thu, 25 Mar 2021 08:00:56 +0000
From:   Jon Hunter <jonathanh@...dia.com>
To:     Joakim Zhang <qiangqing.zhang@....com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: Regression v5.12-rc3: net: stmmac: re-init rx buffers when mac
 resume back


On 25/03/2021 07:53, Joakim Zhang wrote:
> 
>> -----Original Message-----
>> From: Jon Hunter <jonathanh@...dia.com>
>> Sent: 2021年3月24日 20:39
>> To: Joakim Zhang <qiangqing.zhang@....com>
>> Cc: netdev@...r.kernel.org; Linux Kernel Mailing List
>> <linux-kernel@...r.kernel.org>; linux-tegra <linux-tegra@...r.kernel.org>;
>> Jakub Kicinski <kuba@...nel.org>
>> Subject: Re: Regression v5.12-rc3: net: stmmac: re-init rx buffers when mac
>> resume back
>>
>>
>>
>> On 24/03/2021 12:20, Joakim Zhang wrote:
>>
>> ...
>>
>>> Sorry for this breakage at your side.
>>>
>>> You mean one of your boards? Does other boards with STMMAC can work
>> fine?
>>
>> We have two devices with the STMMAC and one works OK and the other fails.
>> They are different generation of device and so there could be some
>> architectural differences which is causing this to only be seen on one device.
> It's really strange, but I also don't know what architectural differences could affect this. Sorry.


Maybe caching somewhere? In other words, could there be any cache
flushing that we are missing here?

>>> We do daily test with NFS to mount rootfs, on issue found. And I add this
>> patch at the resume patch, and on error check, this should not break suspend.
>>> I even did the overnight stress test, there is no issue found.
>>>
>>> Could you please do more test to see where the issue happen?
>>
>> The issue occurs 100% of the time on the failing board and always on the first
>> resume from suspend. Is there any more debug I can enable to track down
>> what the problem is?
>>
> 
> As commit messages described, the patch aims to re-init rx buffers address, since the address is not fixed, so I only can 
> recycle and then re-allocate all of them. The page pool is allocated once when open the net device.
> 
> Could you please debug if it fails at some functions, such as page_pool_dev_alloc_pages() ?


Yes that was the first thing I tried, but no obvious failures from
allocating the pools.

Are you certain that the problem you are seeing, that is being fixed by
this change, is generic to all devices? The commit message states that
'descriptor write back by DMA could exhibit unusual behavior', is this a
known issue in the STMMAC controller? If so does this impact all
versions and what is the actual problem?

Jon

-- 
nvpublic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ