netdev - Re: r8169: transmit queue timeouts and IRQ masking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c71a960f-16d3-41f0-9899-0040116b30ee@gmail.com>
Date: Sun, 12 May 2024 20:49:26 +0100
From: Ken Milmore <ken.milmore@...il.com>
To: Heiner Kallweit <hkallweit1@...il.com>, netdev@...r.kernel.org
Cc: nic_swsd@...ltek.com
Subject: Re: r8169: transmit queue timeouts and IRQ masking

On 11/05/2024 17:31, Heiner Kallweit wrote:
> On 11.05.2024 00:29, Ken Milmore wrote:
>>
>> Reading this worries me though:
>>
>> https://docs.kernel.org/networking/napi.html
>> "napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit.
>> This means that drivers should avoid accessing any data structures after calling napi_complete_done()."
>>
> According to kernel doc napi_disable() waits.
> 
> /**
>  *	napi_disable - prevent NAPI from scheduling
>  *	@n: NAPI context
>  *
>  * Stop NAPI from being scheduled on this context.
>  * Waits till any outstanding processing completes.
>  */
> 
>> Which seems to imply that the IRQ enable following napi_complete_done() is unguarded, and might race with the disable on an incoming poll.
>> Is that a possibility?
> 
> Same documents states in section "Scheduling and IRQ masking":
> IRQ should only be unmasked after a successful call to napi_complete_done()
> So I think we should be fine.
> 

Nevertheless, it would be good if we could get away without the flag.

I had started out with the assumption that an interrupt acknowledgement coinciding with some part of the work being done in rtl8169_poll() might be the cause of the problem.
So it seemed natural to try guarding the whole block by disabling interrupts at the beginning.
But this seems to work just as well:

diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
index 6e34177..353ce99 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4659,8 +4659,10 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)

 	work_done = rtl_rx(dev, tp, budget);

-	if (work_done < budget && napi_complete_done(napi, work_done))
+	if (work_done < budget && napi_complete_done(napi, work_done)) {
+		rtl_irq_disable(tp);
 		rtl_irq_enable(tp);
+	}

 	return work_done;
 }

On this basis, I assume the problem may actually involve some subtlety with the behaviour of the interrupt mask and status registers.

In addition, I'm not sure it is such a good idea to do away with disabling interrupts from within rtl8169_interrupt().
This causes a modest, but noticeable increase in IRQ rate which I measured at around 3 to 7%, depending on whether the load is Tx or Rx heavy and also on the setting of gro_flush_timeout and napi_defer_hard_irqs.

e.g.
Tx only test with iperf3, gro_flush_timeout=20000, napi_defer_hard_irqs=1:
Averaged 32343 vs 30165 interrupts per second, an increase of about 7%.

Bidirectional test with with gro_flush_timeout=0, napi_defer_hard_irqs=0:
Averaged 82118 vs 79689 interrupts per second, an increase of about 3%.

Given that these NICs are already fairly heavy on interrupt rate, it seems a shame to make them even worse!

All in all I preferred the solution where we do all the interrupt disabling in rtl8169_interrupt(), notwithstanding that it may require a change to the interface of napi_schedule_prep().