[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8f548f8-6d16-4a30-9408-80e4212afe9c@intel.com>
Date: Tue, 14 May 2024 12:52:18 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Heiner Kallweit <hkallweit1@...il.com>, Paolo Abeni <pabeni@...hat.com>,
Jakub Kicinski <kuba@...nel.org>, David Miller <davem@...emloft.net>, Realtek
linux nic maintainers <nic_swsd@...ltek.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, Ken Milmore <ken.milmore@...il.com>
Subject: Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled
NAPI
From: Eric Dumazet <edumazet@...gle.com>
Date: Tue, 14 May 2024 11:45:05 +0200
> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@...il.com> wrote:
>>
>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>> In this scenario device interrupts aren't disabled, what seems to
>> trigger some silicon bug under heavy load. I was able to reproduce this
>> behavior on RTL8168h.
>> Disabling device interrupts if NAPI is scheduled from a place other than
>> the driver's interrupt handler is a necessity in r8169, for other
>> drivers it may still be a performance optimization.
>>
>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>> Reported-by: Ken Milmore <ken.milmore@...il.com>
>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
>> ---
>> drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index e5ea827a2..01f0ca53d 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>> {
>> struct rtl8169_private *tp = dev_instance;
>> u32 status = rtl_get_events(tp);
>> + int ret;
>>
>> if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>> return IRQ_NONE;
>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>> rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>> }
>>
>> - if (napi_schedule_prep(&tp->napi)) {
>> + ret = __napi_schedule_prep(&tp->napi);
>> + if (ret >= 0)
>> rtl_irq_disable(tp);
>> + if (ret > 0)
>> __napi_schedule(&tp->napi);
>> - }
>> out:
>> rtl_ack_events(tp, status);
>>
>
> I do not understand this patch.
>
> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> but this should not happen under normal operations ?
Without this patch, napi_schedule_prep() returns false if it's either
scheduled already OR it's disabled. Drivers disable interrupts only if
it returns true, which means they don't do that if it's already scheduled.
With this patch, __napi_schedule_prep() returns -1 if it's disabled and
0 if it was already scheduled. Which means we can disable interrupts
when the result is >= 0, i.e. regardless if it was scheduled before the
call or within the call.
IIUC, this addresses such situations:
napi_schedule() // we disabled interrupts
napi_poll() // we polled < budget frames
napi_complete_done() // reenable the interrupts, no repoll
hrtimer_start() // GRO flush is queued
napi_schedule()
napi_poll() // GRO flush, BUT interrupts are enabled
On r8169, this seems to cause issues. On other drivers, it seems to be
okay, but with this new helper, you can save some cycles.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@...el.com>
>
> A simple revert would avoid adding yet another NAPI helper.
Thanks,
Olek
Powered by blists - more mailing lists