netdev - Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <kwmabr7bujzxkr425do5mtxwulpsnj3iaj7ek2knv4hfyoxev5@zhzqitfu4qo4>
Date: Thu, 20 Feb 2025 08:35:17 -0300
From: Wander Lairson Costa <wander@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, davem@...emloft.net, 
	kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com, andrew+netdev@...n.ch, 
	netdev@...r.kernel.org, rostedt@...dmis.org, clrkwllms@...nel.org, jgarzik@...hat.com, 
	yuma@...hat.com, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling
 for PREEMPT_RT

On Wed, Feb 19, 2025 at 05:35:54PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-18 11:50:55 [-0300], Wander Lairson Costa wrote:
> > These logs are for the test case of booting the kernel with nr_cpus=1:
> > 
> >      kworker/0:0-8       [000] d..2.  2120.708145: process_one_work <-worker_thread
> >      kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reset_task <-process_one_work
> 
> This looks like someone broke the function tracer because the preemtion
> level should be 0 here not 1. So we would have to substract one… This
> does remind me of something else…
> 

That fooled for quite a while. That's why I claimed the preemption was
disabled at beginning.

> …
> >      kworker/0:0-8       [000] b..13  2120.718620: e1000_reset_hw_vf <-igbvf_reset
> …
> >      kworker/0:0-8       [000] D.h.3  2120.718626: irq_handler_entry: irq=63 name=ens14f0
> ^ the interrupt.
> …
> >      kworker/0:0-8       [000] b..13  2120.719133: e1000_check_for_ack_vf <-e1000_write_posted_mbx
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other
> 
> The threaded interrupt is postponed due to the BH-off section. I am
> working on lifting that restriction. Therefore it gets on CPU right
> after kworker's bh-enable.
> 
> …
> > The threaded interrupt handler is called right after (during?)
> > spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
> > field there.
> 
> The hardware interrupt handler gets there while worker is in the wait
> loop. The threaded interrupt handler gets postponed until after the last
> spin_unlock_bh(). The BH part is the important part.
> With that log, I expect the same hold-off part with threaded interrupts
> and the same BH-off synchronisation.
> 
> > I am currently working on something else that has a higher priority, so
> > I don't have time right now to go deeper on that. But feel free to ask
> > me for any test or trace you may need.
> 
> I would need to check if it is safe to explicitly request the threaded
> handler but this is what I would suggest. It works around the issue for
> threaded interrupts and PREEMPT_RT as its user.
> You confirmed that it works, right?
> 

Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.

> Sebastian
>