netdev - Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cdaf9e9a-881c-4324-a886-0ed38e2de72e@gmail.com>
Date: Tue, 14 May 2024 19:09:21 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>,
 Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 David Miller <davem@...emloft.net>,
 Realtek linux nic maintainers <nic_swsd@...ltek.com>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 Ken Milmore <ken.milmore@...il.com>
Subject: Re: [PATCH net 2/2] r8169: disable interrupts also for GRO-scheduled
 NAPI

On 14.05.2024 18:49, Jakub Kicinski wrote:
> On Tue, 14 May 2024 18:35:46 +0200 Heiner Kallweit wrote:
>>> I thought the bug is because of a race with disable.  
>>
>> No, the second napi_poll() in this scenario is executed with device
>> interrupts enabled, what triggers a (supposedly) hw bug under heavy
>> load. So the fix is to disable device interrupts also in the case
>> that NAPI is already scheduled when entering the interrupt handler.
>>
>>> But there's already a synchronize_net() after disable, so NAPI poll
>>> must fully exit before we mask in rtl8169_cleanup().
>>>
>>> If the bug is double-enable you describe the fix is just making 
>>> the race window smaller. But I don't think that's the bug.
>>>
>>> BTW why are events only acked in rtl8169_interrupt() and not
>>> rtl8169_poll()?   
>>
>> You mean clearing the rx/tx-related interrupt status bits only
>> after napi_complete_done(), as an alternative to disabling
>> device interrupts?
> 
> Before, basically ack them at the start of a poll function.
> If gro_timeout / IRQ suppression is not enabled it won't make 
> much of a difference. Probably also won't make much difference
> with iperf.
> 
> But normally traffic is bursty so with gro_timeout we can see 
> something like:
> 
>     packets: x x  x  x x   <  no more packets  >
> IRQ pending: xxx  xxxxxxxxxxxxxxxxxxxxxx
>         ISR:    []                      []
>     IRQ ack:    x                       x
>        NAPI:     [=====] < timeout > [=] [=] < timeout > [=]
> 
> Acking at the beginning of NAPI poll can't make us miss events 
> but we'd clear the pending IRQ on the "deferred" NAPI run, avoiding 
> an extra HW IRQ and 2 NAPI calls:
> 
>     packets: x x  x  x x   <  no more packets  >
> IRQ pending: xxxx xxxxxxxxxxxxxxxxxxx
>         ISR:    []                   
>     IRQ ack:     x                   x
>        NAPI:     [=====] < timeout > [=]

Thanks for the explanation. What is the benefit of acking interrupts
at the beginning of NAPI poll, compared to acking them after
napi_complete_done()?
If budget is exceeded and we know we're polled again, why ack
the interrupts in between?
I just tested with the defaults of gro_flush_timeout=20000 and
napi_defer_hardirqs=1, and iperf3 --bidir.
The difference is massive. When acking after napi_complete_done()
I see only a few hundred interrupts. Acking at the beginning of
NAPI poll it's few hundred thousand interrupts.