netdev - Re: Testing of r8169 workaround removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFbqK8=yOnhNRJH0sdXK4i5TGev5yZcFc+W_6K1VY3kQZ64Eww@mail.gmail.com>
Date:   Mon, 29 Apr 2019 04:55:12 +0100
From:   Neil MacLeod <neil@...cleod.com>
To:     Phil Reid <preid@...ctromag.com.au>
Cc:     Heiner Kallweit <hkallweit1@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Testing of r8169 workaround removal

Phil

Many thanks - you've nailed it!

Reverting the workaround from Heiner, and also
fc8f36de77111bf925d19f347c2113, resulted in 5.0.6 syncing at 10Mbps
after resuming from S3 instead of the 1000Mbps it syncs at boot.

Hopefully this is helpful in understanding why the workaround is no
longer required.

Thanks all
Neil

On Mon, 29 Apr 2019 at 04:20, Phil Reid <preid@...ctromag.com.au> wrote:
>
> On 29/04/2019 6:05 am, Neil MacLeod wrote:
> > Hi Heiner
> >
> > 5.0.6 is the first kernel that does NOT require the workaround.
> >
> > In 5.0.6 the only obvious r8169 change (to my untrained eyes) is:
> >
> > https://github.com/torvalds/linux/commit/4951fc65d9153deded3d066ab371a61977c96e8a
> >
> > but reverting this change in addition to the workaround makes no
> > difference, the resulting kernel still resumes at 1000Mbps so I'm not
> > sure what other change in .5.0.6 might be responsible for this changed
> > behaviour. If you can think of anything I'll give it a try!
> >
> > Regards
> > Neil
>
> The symptom sounds very similar to a problem I had with 1G link only linking at 10M.
>
> Perhaps have a look at:
> net: phy: don't clear BMCR in genphy_soft_reset
>
> https://www.spinics.net/lists/netdev/msg559627.html
>
> Which looks to have been added in 5.0.6
> commit  fc8f36de77111bf925d19f347c21134542941a3c
>
>
> >
> > PS. A while ago (5 Dec 2018 to be precise!) I emailed you about the
> > ASPM issue which it looks like you may have fixed in 5.1-rc5[1].
> > Unfortunately I don't have this issue myself, and I've been trying to
> > get feedback from the bug reporter[2,3] "Matt Devo" without much
> > success but will confirm to you if/when he replies.
> >
> > 1, https://bugzilla.kernel.org/show_bug.cgi?id=202945
> > 2. https://forum.kodi.tv/showthread.php?tid=298462&pid=2845944#pid2845944
> > 3. https://forum.kodi.tv/showthread.php?tid=343069&pid=2850123#pid2850123
> >
> > On Sun, 28 Apr 2019 at 19:43, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>
> >> Interesting, thanks for your efforts! I submitted the patch removing
> >> the workaround because it seems now (at least since 5.1-rc1) we're fine.
> >>
> >> Heiner
> >>
> >> On 28.04.2019 20:40, Neil MacLeod wrote:
> >>> Hi Heiner
> >>>
> >>> I'd already kicked off a 5.0.2 build without the workaround and I've
> >>> tested that now, and it resumes at 10Mbps, so it may still be worth
> >>> identifying the exact 5.0.y version when it was fixed just in case
> >>> that provides some understanding of how it was fixed... I'll test the
> >>> remaining kernels between 5.0.3 and 5.0.10 as that's not much extra
> >>> work and let you know what I find!
> >>>
> >>> Regards
> >>> Neil
> >>>
> >>> On Sun, 28 Apr 2019 at 18:39, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>
> >>>> Hi Neil,
> >>>>
> >>>> thanks for reporting back. Interesting, then the root cause of the
> >>>> issue seems to have been in a different corner. On my hardware
> >>>> I'm not able to reproduce the issue. It's not that relevant with which
> >>>> exact version the issue vanished. Based on your results I'll just
> >>>> remove the workaround on net-next (adding your Tested-by).
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 28.04.2019 19:30, Neil MacLeod wrote:
> >>>>> Hi Heiner
> >>>>>
> >>>>> Do you know if this is already fixed in 5.1-rc6 (Linus Torvalds tree),
> >>>>> as in order to test your request I thought I would reproduce the issue
> >>>>> with plain 5.1-rc6 with the workaround removed, however without the
> >>>>> workaround 5.1-rc6 is resuming correctly at 1000Mbps.
> >>>>>
> >>>>> I went back to 4.19-rc4 (which we know is brroken) and I can reproduce
> >>>>> the issue with the PC (Revo 3700) resuming at 10Mbps, but with 5.1-rc6
> >>>>> I can no longer reproduce the issue when the workaround is removed.
> >>>>>
> >>>>> I also tested 5.0.10 without the workaround, and again 5.0.10 is
> >>>>> resuming correctly at 1000Mbps.
> >>>>>
> >>>>> I finally tested 4.19.23 without the workaround (the last iteration of
> >>>>> this kernel I published) and this does NOT resume correctly at
> >>>>> 1000Mbps (it resumes at 10Mbps).
> >>>>>
> >>>>> I'll test a few more iterations of 5.0.y to see if I can identify when
> >>>>> it was "fixed" but if you have any suggestions when it might have been
> >>>>> fixed I can try to confirm this that - currently it's somewhere
> >>>>> between 4.19.24 and 5.0.10!
> >>>>>
> >>>>> Regards
> >>>>> Neil
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, 28 Apr 2019 at 14:33, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>>>
> >>>>>> Hi Neil,
> >>>>>>
> >>>>>> you once reported the original issue resulting in this workaround.
> >>>>>> This workaround shouldn't be needed any longer, but I have no affected HW
> >>>>>> to test on. Do you have the option to apply the patch below to latest
> >>>>>> net-next and test link speed after resume from suspend?
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> >>>>>> That would be much appreciated.
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>> ----------------------------------------------------------------
> >>>>>>
> >>>>>> After 8c90b795e90f ("net: phy: improve genphy_soft_reset") this
> >>>>>> workaround shouldn't be needed any longer. However I don't have
> >>>>>> affected hardware so I can't test it.
> >>>>>>
> >>>>>> This was the bug report leading to the workaround:
> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=201081
> >>>>>>
> >>>>>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
> >>>>>> ---
> >>>>>>   drivers/net/ethernet/realtek/r8169.c | 8 --------
> >>>>>>   1 file changed, 8 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> >>>>>> index 383242df0..d4ec08e37 100644
> >>>>>> --- a/drivers/net/ethernet/realtek/r8169.c
> >>>>>> +++ b/drivers/net/ethernet/realtek/r8169.c
> >>>>>> @@ -4083,14 +4083,6 @@ static void rtl8169_init_phy(struct net_device *dev, struct rtl8169_private *tp)
> >>>>>>          phy_speed_up(tp->phydev);
> >>>>>>
> >>>>>>          genphy_soft_reset(tp->phydev);
> >>>>>> -
> >>>>>> -       /* It was reported that several chips end up with 10MBit/Half on a
> >>>>>> -        * 1GBit link after resuming from S3. For whatever reason the PHY on
> >>>>>> -        * these chips doesn't properly start a renegotiation when soft-reset.
> >>>>>> -        * Explicitly requesting a renegotiation fixes this.
> >>>>>> -        */
> >>>>>> -       if (tp->phydev->autoneg == AUTONEG_ENABLE)
> >>>>>> -               phy_restart_aneg(tp->phydev);
> >>>>>>   }
> >>>>>>
> >>>>>>   static void rtl_rar_set(struct rtl8169_private *tp, u8 *addr)
> >>>>>> --
> >>>>>> 2.21.0
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>
>
> --
> Regards
> Phil Reid
>
> ElectroMagnetic Imaging Technology Pty Ltd
> Development of Geophysical Instrumentation & Software
> www.electromag.com.au
>
> 3 The Avenue, Midland WA 6056, AUSTRALIA
> Ph: +61 8 9250 8100
> Fax: +61 8 9250 7100
> Email: preid@...ctromag.com.au