lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200925115241.3709caf6@ezekiel.suse.cz>
Date:   Fri, 25 Sep 2020 11:52:41 +0200
From:   Petr Tesarik <ptesarik@...e.cz>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        netdev@...r.kernel.org
Subject: Re: RTL8402 stops working after hibernate/resume

On Fri, 25 Sep 2020 11:44:09 +0200
Heiner Kallweit <hkallweit1@...il.com> wrote:

> On 25.09.2020 10:54, Petr Tesarik wrote:
> > On Fri, 25 Sep 2020 09:30:37 +0200
> > Petr Tesarik <ptesarik@...e.cz> wrote:
> >   
> >> On Thu, 24 Sep 2020 22:12:24 +0200
> >> Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>  
> >>> On 24.09.2020 21:14, Petr Tesarik wrote:    
> >>>> On Wed, 23 Sep 2020 11:57:41 +0200
> >>>> Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>     
> >>>>> On 03.09.2020 10:41, Petr Tesarik wrote:    
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> this issue was on the back-burner for some time, but I've got some
> >>>>>> interesting news now.
> >>>>>>
> >>>>>> On Sat, 18 Jul 2020 14:07:50 +0200
> >>>>>> Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>>>       
> >>>>>>> [...]
> >>>>>>> Maybe the following gives us an idea:
> >>>>>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
> >>>>>>> and check for differences.      
> >>>>>>
> >>>>>> The register dump did not reveal anything of interest - the only
> >>>>>> differences were in the physical addresses after a device reopen.
> >>>>>>
> >>>>>> However, knowing that reloading the driver can fix the issue, I copied
> >>>>>> the initialization sequence from init_one() to rtl8169_resume() and
> >>>>>> gave it a try. That works!
> >>>>>>
> >>>>>> Then I started removing the initialization calls one by one. This
> >>>>>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> >>>>>> RxConfig register. In other words, these is the difference between
> >>>>>> 5.8.4 and my working version:
> >>>>>>
> >>>>>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> >>>>>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> >>>>>> @@ -4925,6 +4925,9 @@
> >>>>>>  
> >>>>>>  	clk_prepare_enable(tp->clk);
> >>>>>>  
> >>>>>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> >>>>>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> >>>>>> +
> >>>>>>  	if (netif_running(tp->dev))
> >>>>>>  		__rtl8169_resume(tp);
> >>>>>>  
> >>>>>> This is quite surprising, at least when the device is managed by
> >>>>>> NetworkManager, because then it is closed on wakeup, and the open
> >>>>>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> >>>>>> issue, or incorrect order of register writes.
> >>>>>>       
> >>>>> Thanks for the analysis. If you manually bring down and up the
> >>>>> interface, do you see the same issue?    
> >>>>
> >>>> I'm not quite sure what you mean, but if the interface is configured
> >>>> (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> >>>> then 'ip link set eth0 up', and the interface is fully functional.
> >>>>     
> >>>>> What is the value of RxConfig when entering the resume function?    
> >>>>
> >>>> I added a dev_info() to rtl8169_resume(). First with NetworkManager
> >>>> active (i.e. interface down on suspend):
> >>>>
> >>>> [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> >>>>
> >>>> Then I re-tried with NetworkManager stopped (i.e. interface up on
> >>>> suspend). Same result:
> >>>>
> >>>> [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> >>>>
> >>>> I hope that's what you were asking for...
> >>>>
> >>>> Petr T
> >>>>     
> >>>
> >>> rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
> >>> apply cleanly on older kernel versions. Can you test the following
> >>> on a 5.9-rc version or linux-next?    
> >>
> >> I tried installing 5.9-rc6, but it freezes hard at boot, last message is:
> >>
> >> [   14.916259] libphy: r8169: probed
> >>  
> 
> This doesn't necessarily mean that the r8169 driver crashes the system.
> Other things could run in parallel. It freezes w/o any message?

The system freezes hard. I have already encountered a similar freeze
with the alternative r8169 driver, so it's quite likely related.

> >> At this point, I suspect you're right that the BIOS is seriously buggy.
> >> Let me check if ASUSTek has released any update for this model.  
> > 
>[...]
> > Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
> >   
> If the workaround is small and there's little chance to break other stuff: then usually yes.
> If you can spend the effort to bisect the issue, this would be appreciated.

OK, then I'm going to give it a try.

Stay tuned,
Petr T

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ