lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c71a960f-16d3-41f0-9899-0040116b30ee@gmail.com>
Date: Sun, 12 May 2024 20:49:26 +0100
From: Ken Milmore <ken.milmore@...il.com>
To: Heiner Kallweit <hkallweit1@...il.com>, netdev@...r.kernel.org
Cc: nic_swsd@...ltek.com
Subject: Re: r8169: transmit queue timeouts and IRQ masking



On 11/05/2024 17:31, Heiner Kallweit wrote:
> On 11.05.2024 00:29, Ken Milmore wrote:
>>
>> Reading this worries me though:
>>
>> https://docs.kernel.org/networking/napi.html
>> "napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit.
>> This means that drivers should avoid accessing any data structures after calling napi_complete_done()."
>>
> According to kernel doc napi_disable() waits.
> 
> /**
>  *	napi_disable - prevent NAPI from scheduling
>  *	@n: NAPI context
>  *
>  * Stop NAPI from being scheduled on this context.
>  * Waits till any outstanding processing completes.
>  */
> 
>> Which seems to imply that the IRQ enable following napi_complete_done() is unguarded, and might race with the disable on an incoming poll.
>> Is that a possibility?
> 
> Same documents states in section "Scheduling and IRQ masking":
> IRQ should only be unmasked after a successful call to napi_complete_done()
> So I think we should be fine.
> 

Nevertheless, it would be good if we could get away without the flag.

I had started out with the assumption that an interrupt acknowledgement coinciding with some part of the work being done in rtl8169_poll() might be the cause of the problem.
So it seemed natural to try guarding the whole block by disabling interrupts at the beginning.
But this seems to work just as well:

diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
index 6e34177..353ce99 100644
--- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
+++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
@@ -4659,8 +4659,10 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 
 	work_done = rtl_rx(dev, tp, budget);
 
-	if (work_done < budget && napi_complete_done(napi, work_done))
+	if (work_done < budget && napi_complete_done(napi, work_done)) {
+		rtl_irq_disable(tp);
 		rtl_irq_enable(tp);
+	}
 
 	return work_done;
 }

On this basis, I assume the problem may actually involve some subtlety with the behaviour of the interrupt mask and status registers.

In addition, I'm not sure it is such a good idea to do away with disabling interrupts from within rtl8169_interrupt().
This causes a modest, but noticeable increase in IRQ rate which I measured at around 3 to 7%, depending on whether the load is Tx or Rx heavy and also on the setting of gro_flush_timeout and napi_defer_hard_irqs.

e.g.
Tx only test with iperf3, gro_flush_timeout=20000, napi_defer_hard_irqs=1:
Averaged 32343 vs 30165 interrupts per second, an increase of about 7%.

Bidirectional test with with gro_flush_timeout=0, napi_defer_hard_irqs=0:
Averaged 82118 vs 79689 interrupts per second, an increase of about 3%.

Given that these NICs are already fairly heavy on interrupt rate, it seems a shame to make them even worse!

All in all I preferred the solution where we do all the interrupt disabling in rtl8169_interrupt(), notwithstanding that it may require a change to the interface of napi_schedule_prep().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ