lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8f548f8-6d16-4a30-9408-80e4212afe9c@intel.com>
Date: Tue, 14 May 2024 12:52:18 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Heiner Kallweit <hkallweit1@...il.com>, Paolo Abeni <pabeni@...hat.com>,
	Jakub Kicinski <kuba@...nel.org>, David Miller <davem@...emloft.net>, Realtek
 linux nic maintainers <nic_swsd@...ltek.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, Ken Milmore <ken.milmore@...il.com>
Subject: Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled
 NAPI

From: Eric Dumazet <edumazet@...gle.com>
Date: Tue, 14 May 2024 11:45:05 +0200

> On Tue, May 14, 2024 at 8:52 AM Heiner Kallweit <hkallweit1@...il.com> wrote:
>>
>> Ken reported that RTL8125b can lock up if gro_flush_timeout has the
>> default value of 20000 and napi_defer_hard_irqs is set to 0.
>> In this scenario device interrupts aren't disabled, what seems to
>> trigger some silicon bug under heavy load. I was able to reproduce this
>> behavior on RTL8168h.
>> Disabling device interrupts if NAPI is scheduled from a place other than
>> the driver's interrupt handler is a necessity in r8169, for other
>> drivers it may still be a performance optimization.
>>
>> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
>> Reported-by: Ken Milmore <ken.milmore@...il.com>
>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
>> ---
>>  drivers/net/ethernet/realtek/r8169_main.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index e5ea827a2..01f0ca53d 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -4639,6 +4639,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>  {
>>         struct rtl8169_private *tp = dev_instance;
>>         u32 status = rtl_get_events(tp);
>> +       int ret;
>>
>>         if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
>>                 return IRQ_NONE;
>> @@ -4657,10 +4658,11 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>>                 rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>>         }
>>
>> -       if (napi_schedule_prep(&tp->napi)) {
>> +       ret = __napi_schedule_prep(&tp->napi);
>> +       if (ret >= 0)
>>                 rtl_irq_disable(tp);
>> +       if (ret > 0)
>>                 __napi_schedule(&tp->napi);
>> -       }
>>  out:
>>         rtl_ack_events(tp, status);
>>
> 
> I do not understand this patch.
> 
> __napi_schedule_prep() would only return -1 if NAPIF_STATE_DISABLE was set,
> but this should not happen under normal operations ?

Without this patch, napi_schedule_prep() returns false if it's either
scheduled already OR it's disabled. Drivers disable interrupts only if
it returns true, which means they don't do that if it's already scheduled.
With this patch, __napi_schedule_prep() returns -1 if it's disabled and
0 if it was already scheduled. Which means we can disable interrupts
when the result is >= 0, i.e. regardless if it was scheduled before the
call or within the call.

IIUC, this addresses such situations:

napi_schedule()		// we disabled interrupts
napi_poll()		// we polled < budget frames
napi_complete_done()	// reenable the interrupts, no repoll
  hrtimer_start()	// GRO flush is queued
    napi_schedule()
      napi_poll()	// GRO flush, BUT interrupts are enabled

On r8169, this seems to cause issues. On other drivers, it seems to be
okay, but with this new helper, you can save some cycles.

Reviewed-by: Alexander Lobakin <aleksander.lobakin@...el.com>

> 
> A simple revert would avoid adding yet another NAPI helper.

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ