lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35fa38fc-9d1e-4a22-86dd-a4c9147d7f70@gmail.com>
Date: Mon, 13 May 2024 00:08:01 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Ken Milmore <ken.milmore@...il.com>, netdev@...r.kernel.org
Cc: nic_swsd@...ltek.com
Subject: Re: r8169: transmit queue timeouts and IRQ masking

On 12.05.2024 21:49, Ken Milmore wrote:
> 
> 
> On 11/05/2024 17:31, Heiner Kallweit wrote:
>> On 11.05.2024 00:29, Ken Milmore wrote:
>>>
>>> Reading this worries me though:
>>>
>>> https://docs.kernel.org/networking/napi.html
>>> "napi_disable() and subsequent calls to the poll method only wait for the ownership of the instance to be released, not for the poll method to exit.
>>> This means that drivers should avoid accessing any data structures after calling napi_complete_done()."
>>>
>> According to kernel doc napi_disable() waits.
>>
>> /**
>>  *	napi_disable - prevent NAPI from scheduling
>>  *	@n: NAPI context
>>  *
>>  * Stop NAPI from being scheduled on this context.
>>  * Waits till any outstanding processing completes.
>>  */
>>
>>> Which seems to imply that the IRQ enable following napi_complete_done() is unguarded, and might race with the disable on an incoming poll.
>>> Is that a possibility?
>>
>> Same documents states in section "Scheduling and IRQ masking":
>> IRQ should only be unmasked after a successful call to napi_complete_done()
>> So I think we should be fine.
>>
> 
> Nevertheless, it would be good if we could get away without the flag.
> 
> I had started out with the assumption that an interrupt acknowledgement coinciding with some part of the work being done in rtl8169_poll() might be the cause of the problem.
> So it seemed natural to try guarding the whole block by disabling interrupts at the beginning.
> But this seems to work just as well:
> 
> diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
> index 6e34177..353ce99 100644
> --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c
> +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4659,8 +4659,10 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
>  
>  	work_done = rtl_rx(dev, tp, budget);
>  
> -	if (work_done < budget && napi_complete_done(napi, work_done))
> +	if (work_done < budget && napi_complete_done(napi, work_done)) {
> +		rtl_irq_disable(tp);
>  		rtl_irq_enable(tp);
> +	}
>  
>  	return work_done;
>  }
> 
> On this basis, I assume the problem may actually involve some subtlety with the behaviour of the interrupt mask and status registers.
> 
In the register dump in your original report the interrupt mask is set.
So it seems rtl_irq_enable() was executed. I don't have an explanation
why a previous rtl_irq_disable() makes a difference.
Interesting would be whether it has to be a write to the interrupt mask
register, or whether a write to any register is sufficient.

> In addition, I'm not sure it is such a good idea to do away with disabling interrupts from within rtl8169_interrupt().
> This causes a modest, but noticeable increase in IRQ rate which I measured at around 3 to 7%, depending on whether the load is Tx or Rx heavy and also on the setting of gro_flush_timeout and napi_defer_hard_irqs.
> 
> e.g.
> Tx only test with iperf3, gro_flush_timeout=20000, napi_defer_hard_irqs=1:
> Averaged 32343 vs 30165 interrupts per second, an increase of about 7%.
> 
> Bidirectional test with with gro_flush_timeout=0, napi_defer_hard_irqs=0:
> Averaged 82118 vs 79689 interrupts per second, an increase of about 3%.
> 
> Given that these NICs are already fairly heavy on interrupt rate, it seems a shame to make them even worse!
> 
> All in all I preferred the solution where we do all the interrupt disabling in rtl8169_interrupt(), notwithstanding that it may require a change to the interface of napi_schedule_prep().

I agree.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ