lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKHuaMsUq1Os9+eEpbovphiauchggjsei3ki8gggiPQtA@mail.gmail.com>
Date: Tue, 14 May 2024 16:27:07 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: Heiner Kallweit <hkallweit1@...il.com>, Paolo Abeni <pabeni@...hat.com>, 
	Jakub Kicinski <kuba@...nel.org>, David Miller <davem@...emloft.net>, 
	Realtek linux nic maintainers <nic_swsd@...ltek.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 
	Ken Milmore <ken.milmore@...il.com>
Subject: Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI

On Tue, May 14, 2024 at 1:18 PM Alexander Lobakin
<aleksander.lobakin@...el.com> wrote:
>
> From: Eric Dumazet <edumazet@...gle.com>
> Date: Tue, 14 May 2024 13:05:55 +0200
>
> > On Tue, May 14, 2024 at 12:53 PM Alexander Lobakin
> > <aleksander.lobakin@...el.com> wrote:
> >>
> >> From: Eric Dumazet <edumazet@...gle.com>
> >> Date: Tue, 14 May 2024 11:45:05 +0200
> >>
> >>> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>
> >>>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
> >>>> default value of 20000 and napi_defer_hard_irqs is set to 0.
> >>>> In this scenario device interrupts aren't disabled, what seems to
> >>>> trigger some silicon bug under heavy load. I was able to reproduce this
> >>>> behavior on RTL8168h.
> >>>> Disabling device interrupts if NAPI is scheduled from a place other than
> >>>> the driver's interrupt handler is a necessity in r8169, for other
> >>>> drivers it may still be a performance optimization.
> >>>>
> >>>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
> >>>> Reported-by: Ken Milmore <ken.milmore@...il.com>
> >>>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
> >>>> ---
> >>>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
> >>>>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> >>>> index e5ea827a2..01f0ca53d 100644
> >>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
> >>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> >>>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>>>  {
> >>>>         struct rtl8169_private *tp = dev_instance;
> >>>>         u32 status = rtl_get_events(tp);
> >>>> +       int ret;
> >>>>
> >>>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
> >>>>                 return IRQ_NONE;
> >>>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> >>>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
> >>>>         }
> >>>>
> >>>> -       if (napi_schedule_prep(&tp->napi)) {
> >>>> +       ret = __napi_schedule_prep(&tp->napi);
> >>>> +       if (ret >= 0)
> >>>>                 rtl_irq_disable(tp);
> >>>> +       if (ret > 0)
> >>>>                 __napi_schedule(&tp->napi);
> >>>> -       }
> >>>>  out:
> >>>>         rtl_ack_events(tp, status);
> >>>>
> >>>
> >>> I do not understand this patch.
> >>>
> >>> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> >>> but this should not happen under normal operations ?
> >>
> >> Without this patch, napi_schedule_prep() returns false if it's either
> >> scheduled already OR it's disabled. Drivers disable interrupts only if
> >> it returns true, which means they don't do that if it's already scheduled.
> >> With this patch, __napi_schedule_prep() returns -1 if it's disabled and
> >> 0 if it was already scheduled. Which means we can disable interrupts
> >> when the result is >= 0, i.e. regardless if it was scheduled before the
> >> call or within the call.
> >>
> >> IIUC, this addresses such situations:
> >>
> >> napi_schedule()         // we disabled interrupts
> >> napi_poll()             // we polled < budget frames
> >> napi_complete_done()    // reenable the interrupts, no repoll
> >>   hrtimer_start()       // GRO flush is queued
> >>     napi_schedule()
> >>       napi_poll()       // GRO flush, BUT interrupts are enabled
> >>
> >> On r8169, this seems to cause issues. On other drivers, it seems to be
> >> okay, but with this new helper, you can save some cycles.
> >>
> >> Reviewed-by: Alexander Lobakin <aleksander.lobakin@...el.com>
> >
> > Rephrasing the changelog is not really helping.
> >
> > Consider myself as a network maintainer, not as a casual patch reviewer.
>
> And?
>
> >
> > "This seems to cause issues" is rather weak.
>
> It has "Reported-by", so it really causes issues.

And ?

Revert ?

>
> >
> > I would simply revert the faulty commit, because the interrupts are
> > going to be disabled no matter what.
> >
> > Old logic was very simple and rock solid. A revert is a clear stable candidate.
> >
> > rtl_irq_disable(tp);
> > napi_schedule(&tp->napi);
> >
> > If this is still broken, we might have similar issues in old/legacy drivers.
>
> I might agree that we could just revert the mentioned commit for stable,
> but for the next net-next, avoid unnecessary
> scheduling/enabling/disabling interrupts makes sense, not only for
> "old/legacy" drivers.
> "Very simple and rock solid" is not an argument for avoiding improvements.

I explained that I failed to see the 'so called' improvement there.

You explained nothing really, just that you like some approach that I
think is for net-next.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ