[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <givxfwonfer5kgowuxuz4bezxkhri4ottnmlmh3duhan3viznb@ic5sscp2twit>
Date: Fri, 17 Jan 2025 10:19:44 -0300
From: Wander Lairson Costa <wander@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
Jeff Garzik <jgarzik@...hat.com>, Auke Kok <auke-jan.h.kok@...el.com>,
"moderated list:INTEL ETHERNET DRIVERS" <intel-wired-lan@...ts.osuosl.org>, "open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT" <linux-rt-devel@...ts.linux.dev>
Subject: Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for
PREEMPT_RT
On Thu, Jan 09, 2025 at 06:45:12PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > > If the issue is indeed the use of threaded interrupts then the fix
> > > should not be limited to be PREEMPT_RT only.
> > >
> > Although I was not aware of this scenario, the patch should work for it as well,
> > as I am forcing it to run in interrupt context. I will test it to confirm.
I tested with the stock kernel with threadirqs and the problem does show up.
Applying the patches the issue is gone.
>
> If I remember correctly there were "ifdef preempt_rt" things in it.
That exists only to handle the case in which part in which the ISR needs to
partially run in thread context (because the piece of code calling kmalloc),
so I need an sleeping lock for that. For non-PREEMPT_RT, we don't have this
constrain.
>
> > > > > - What causes the failure? I see you reworked into two parts to behave
> > > > > similar to what happens without threaded interrupts. There is still no
> > > > > explanation for it. Is there a timing limit or was there another
> > > > > register operation which removed the mailbox message?
> > > > >
> > > >
> > > > I explained the root cause of the issue in the last commit. Maybe I should
> > > > have added the explanation to the cover letter as well. Anyway, here is a
> > > > partial verbatim copy of it:
> > > >
> > > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > > ip link up command intermittently fails for the igbvf interfaces when
> > > > using the PREEMPT_RT variant. Investigation revealed that
> > > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > > from e1000_poll_for_ack.
> > >
> > > That ACK would have come if it would poll longer?
No, the poll happens while preemption is disabled.
> > >
> > No, the service wouldn't be serviced while polling.
s/service/interrupt/. Since we can't sleep at this context, there is
no way to wait for an event.
>
> Hmm.
>
> > > > The underlying issue arises from the fact that IRQs are threaded by
> > > > default under PREEMPT_RT. While the exact hardware details are not
> > > > available, it appears that the IRQ handled by igb_msix_other must
> > > > be processed before e1000_poll_for_ack times out. However,
> > > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > > to a scenario where the IRQ is serviced only after the failure of
> > > > e1000_write_posted_mbx."
> > >
> > > Where is this disabled preemption coming from? This should be one of the
> > > ops.write_posted() calls, right? I've been looking around and don't see
> > > anything obvious.
> >
> > I don't remember if I found the answer by looking at the code or by
> > looking at the ftrace flags.
> > I am currently on sick leave with covid. I can check it when I come back.
>
> Don't worry, get better first. I'm kind of off myself. I'm not sure if I
> have the hardware needed to setup so I can look at it…
>
The reason of why you didn't find is because the interrupt in the igb
driver is triggered inside the igbvf code. igbvf_reset() calls
spin_lock_bh() [1], although in the cases I found it was already called
with preemption disabled from process_one_work() (workqueue) and netlink_sendmsg().
Here is an ftrace log for the failure case:
kworker/-86 0...1 85.381866: function: igbvf_reset
kworker/-86 0...2 85.381866: function: e1000_reset_hw_vf
kworker/-86 0...2 85.381867: function: e1000_check_for_rst_vf
kworker/-86 0...2 85.381868: function: e1000_write_posted_mbx
kworker/-86 0...2 85.381868: function: e1000_write_mbx_vf
kworker/-86 0...2 85.381870: function: e1000_check_for_ack_vf // repeats for 2000 lines
...
kworker/-86 0.N.2 86.393782: function: e1000_read_posted_mbx
kworker/-86 0.N.2 86.398606: function: e1000_init_hw_vf
kworker/-86 0.N.2 86.398606: function: e1000_rar_set_vf
kworker/-86 0.N.2 86.398606: function: e1000_write_posted_mbx
irq/65-e-1287 0d..1 86.398609: function: igb_msix_other
irq/65-e-1287 0d..1 86.398609: function: igb_rd32
irq/65-e-1287 0d..2 86.398610: function: igb_check_for_rst
irq/65-e-1287 0d..2 86.398610: function: igb_check_for_rst_pf
irq/65-e-1287 0d..2 86.398610: function: igb_rd32
irq/65-e-1287 0d..2 86.398611: function: igb_check_for_msg
irq/65-e-1287 0d..2 86.398611: function: igb_check_for_msg_pf
irq/65-e-1287 0d..2 86.398611: function: igb_rd32
irq/65-e-1287 0d..2 86.398612: function: igb_rcv_msg_from_vf
irq/65-e-1287 0d..2 86.398612: function: igb_read_mbx
irq/65-e-1287 0d..2 86.398612: function: igb_read_mbx_pf
irq/65-e-1287 0d..2 86.398612: function: igb_obtain_mbx_lock_pf
irq/65-e-1287 0d..2 86.398612: function: igb_rd32
Notice the interrupt handler only executes after e1000_write_posted()
returns. And here it is for the sucessful case:
ip-5603 0...1 1884.710747: function: igbvf_reset
ip-5603 0...2 1884.710754: function: e1000_reset_hw_vf
ip-5603 0...2 1884.710755: function: e1000_check_for_rst_vf
ip-5603 0...2 1884.710756: function: e1000_write_posted_mbx
ip-5603 0...2 1884.710756: function: e1000_write_mbx_vf
ip-5603 0...2 1884.710758: function: e1000_check_for_ack_vf
ip-5603 0d.h2 1884.710760: function: igb_msix_other
ip-5603 0d.h2 1884.710760: function: igb_rd32
ip-5603 0d.h3 1884.710761: function: igb_check_for_rst
ip-5603 0d.h3 1884.710761: function: igb_check_for_rst_pf
ip-5603 0d.h3 1884.710761: function: igb_rd32
ip-5603 0d.h3 1884.710762: function: igb_check_for_msg
ip-5603 0d.h3 1884.710762: function: igb_check_for_msg_pf
ip-5603 0d.h3 1884.710762: function: igb_rd32
ip-5603 0d.h3 1884.710763: function: igb_rcv_msg_from_vf
ip-5603 0d.h3 1884.710763: function: igb_read_mbx
ip-5603 0d.h3 1884.710763: function: igb_read_mbx_pf
ip-5603 0d.h3 1884.710763: function: igb_obtain_mbx_lock_pf
ip-5603 0d.h3 1884.710763: function: igb_rd32
The ISR executes immediately fater e1000_write_mbx_vf().
[1] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522
> Sebastian
>
Powered by blists - more mailing lists