linux-kernel - Re: [PATCH net] r8169: correct the reset timing of RTL8125 for link-change event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMqyJG0FcY0hymX6xyZwiWbD8zdsYwWG7GMu2zcL9-bMkq-pMw@mail.gmail.com>
Date: Mon, 9 Sep 2024 13:25:07 +0800
From: En-Wei WU <en-wei.wu@...onical.com>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: nic_swsd@...ltek.com, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, kuan-ying.lee@...onical.com, 
	kai.heng.feng@...onical.com, me@...y.org
Subject: Re: [PATCH net] r8169: correct the reset timing of RTL8125 for
 link-change event

Hi Heiner,

Thank you for the quick response.

On Sat, 7 Sept 2024 at 05:17, Heiner Kallweit <hkallweit1@...il.com> wrote:
>
> On 06.09.2024 10:35, En-Wei Wu wrote:
> > The commit 621735f59064 ("r8169: fix rare issue with broken rx after
> > link-down on RTL8125") set a reset work for RTL8125 in
> > r8169_phylink_handler() to avoid the MAC from locking up, this
> > makes the connection broken after unplugging then re-plugging the
> > Ethernet cable.
> >
> > This is because the commit mistakenly put the reset work in the
> > link-down path rather than the link-up path (The commit message says
> > it should be put in the link-up path).
> >
> That's not what the commit message is saying. It says vendor driver
> r8125 does it in the link-up path.
> I moved it intentionally to the link-down path, because traffic may
> be flowing already after link-up.
>
> > Moving the reset work from the link-down path to the link-up path fixes
> > the issue. Also, remove the unnecessary enum member.
> >
> The user who reported the issue at that time confirmed that the original
> change fixed the issue for him.
> Can you explain, from the NICs perspective, what exactly the difference
> is when doing the reset after link-up?
> Including an explanation how the original change suppresses the link-up
> interrupt. And why that's not the case when doing the reset after link-up.

The host-plug test under original change does have the link-up
interrupt and r8169_phylink_handler() called. There is not much clue
why calling reset in link-down path doesn't work but in link-up does.

After several new tests, I found that with the original change, the
link won't break if I unplug and then plug the cable within about 3
seconds. On the other hand, the connections always break if I re-plug
the cable after a few seconds.

With this new patch (reset in link-up path), both of the tests work
without any error.

>
> I simply want to be convinced enough that your change doesn't break
> behavior for other users.
>
> > Fixes: 621735f59064 ("r8169: fix rare issue with broken rx after link-down on RTL8125")
> > Signed-off-by: En-Wei Wu <en-wei.wu@...onical.com>
> > ---
> >  drivers/net/ethernet/realtek/r8169_main.c | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 3507c2e28110..632e661fc74b 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -590,7 +590,6 @@ struct rtl8169_tc_offsets {
> >  enum rtl_flag {
> >       RTL_FLAG_TASK_ENABLED = 0,
> >       RTL_FLAG_TASK_RESET_PENDING,
> > -     RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE,
> >       RTL_FLAG_TASK_TX_TIMEOUT,
> >       RTL_FLAG_MAX
> >  };
> > @@ -4698,8 +4697,6 @@ static void rtl_task(struct work_struct *work)
> >  reset:
> >               rtl_reset_work(tp);
> >               netif_wake_queue(tp->dev);
> > -     } else if (test_and_clear_bit(RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE, tp->wk.flags)) {
> > -             rtl_reset_work(tp);
> >       }
> >  out_unlock:
> >       rtnl_unlock();
> > @@ -4729,11 +4726,13 @@ static void r8169_phylink_handler(struct net_device *ndev)
> >       if (netif_carrier_ok(ndev)) {
> >               rtl_link_chg_patch(tp);
> >               pm_request_resume(d);
> > -             netif_wake_queue(tp->dev);
> > -     } else {
> > +
> >               /* In few cases rx is broken after link-down otherwise */
> >               if (rtl_is_8125(tp))
> > -                     rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE);
> > +                     rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
> > +             else
> > +                     netif_wake_queue(tp->dev);
>
> This call to netif_wake_queue() isn't needed any longer, it was introduced with
> the original change only.
>
> > +     } else {
> >               pm_runtime_idle(d);
> >       }
> >
>

CC. Martin Kjær Jørgensen  <me@...y.org>, could you kindly test if
this new patch works on your environment? Thanks!

En-Wei,
Best regards.