lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240514094908.61593793@kernel.org>
Date: Tue, 14 May 2024 09:49:08 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, David Miller
 <davem@...emloft.net>, Realtek linux nic maintainers
 <nic_swsd@...ltek.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 Ken Milmore <ken.milmore@...il.com>
Subject: Re: [PATCH net 2/2] r8169: disable interrupts also for
 GRO-scheduled NAPI

On Tue, 14 May 2024 18:35:46 +0200 Heiner Kallweit wrote:
> > I thought the bug is because of a race with disable.  
> 
> No, the second napi_poll() in this scenario is executed with device
> interrupts enabled, what triggers a (supposedly) hw bug under heavy
> load. So the fix is to disable device interrupts also in the case
> that NAPI is already scheduled when entering the interrupt handler.
> 
> > But there's already a synchronize_net() after disable, so NAPI poll
> > must fully exit before we mask in rtl8169_cleanup().
> > 
> > If the bug is double-enable you describe the fix is just making 
> > the race window smaller. But I don't think that's the bug.
> > 
> > BTW why are events only acked in rtl8169_interrupt() and not
> > rtl8169_poll()?   
> 
> You mean clearing the rx/tx-related interrupt status bits only
> after napi_complete_done(), as an alternative to disabling
> device interrupts?

Before, basically ack them at the start of a poll function.
If gro_timeout / IRQ suppression is not enabled it won't make 
much of a difference. Probably also won't make much difference
with iperf.

But normally traffic is bursty so with gro_timeout we can see 
something like:

    packets: x x  x  x x   <  no more packets  >
IRQ pending: xxx  xxxxxxxxxxxxxxxxxxxxxx
        ISR:    []                      []
    IRQ ack:    x                       x
       NAPI:     [=====] < timeout > [=] [=] < timeout > [=]

Acking at the beginning of NAPI poll can't make us miss events 
but we'd clear the pending IRQ on the "deferred" NAPI run, avoiding 
an extra HW IRQ and 2 NAPI calls:

    packets: x x  x  x x   <  no more packets  >
IRQ pending: xxxx xxxxxxxxxxxxxxxxxxx
        ISR:    []                   
    IRQ ack:     x                   x
       NAPI:     [=====] < timeout > [=]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ