lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e206e6b-3d0c-de27-dedb-48c30e02649c@gmail.com>
Date:   Tue, 9 Oct 2018 22:36:54 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Chris Clayton <chris2553@...glemail.com>,
        "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Azat Khuzhin <a3at.mail@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

On 09.10.2018 16:40, Chris Clayton wrote:
> Thanks to Maciej and Heiner for their replies.
> 
> On 09/10/2018 13:32, Maciej S. Szmigiero wrote:
>> On 07.10.2018 21:36, Chris Clayton wrote:
>>> Hi again,
>>>
>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the
>>> regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my
>>> browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed
>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from
>>> 14-15ms to more than 1000ms.
>>
>> You can try comparing chip registers (ethtool -d eth0) in the working
>> state (before a suspend) and in the broken state (after a resume).
>> Maybe there will be some obvious in the difference.
>>
>> The same goes for the PCI configuration (lspci -d :8168 -vv).
>>
> Maciej suggested comparing the output from lspci -vv for the ethernet device. They are identical.
> 
> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre and post suspend. Again, they are identical.
> Heiner specifically suggested looking at the RxConfig. The value of that is 0x0002870e both pre and post suspend.
> 
Hmm, this is very weird, especially taking into account that in your original
report you state that removing the call to rtl_init_rxcfg() from rtl_hw_start()
fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and
register values seem to be the same before and after resume. So how can the
chip behave differently?
So far my best guess is that some chip quirk causes it to accept writes to
register RxConfig, but to misinterpret or ignore the written value.
So far your report is the only one (affecting RTL8411), but we don't know
whether other chip versions are affected too.
One option could be to call rtl_init_rxcfg() for chip versions <= 06 only
because for them we know that they need this call.


> I've attached files I redirected the outputs to.
> 
> Please don't hesitate to ask for any other information needed to solve this problem. In the meantime, I've now got
> scripts that stop the network during suspend and restart it during resume. (Those scripts were removed whilst I gathered
> the diagnostics shown in the attachments.)
> 
> Chris
> 
>>> Chris
>>
>> Maciej
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ