[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250109174512.At7ZERjU@linutronix.de>
Date: Thu, 9 Jan 2025 18:45:12 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Wander Lairson Costa <wander@...hat.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Jeff Garzik <jgarzik@...hat.com>,
Auke Kok <auke-jan.h.kok@...el.com>,
"moderated list:INTEL ETHERNET DRIVERS" <intel-wired-lan@...ts.osuosl.org>,
"open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT" <linux-rt-devel@...ts.linux.dev>
Subject: Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for
PREEMPT_RT
On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > If the issue is indeed the use of threaded interrupts then the fix
> > should not be limited to be PREEMPT_RT only.
> >
> Although I was not aware of this scenario, the patch should work for it as well,
> as I am forcing it to run in interrupt context. I will test it to confirm.
If I remember correctly there were "ifdef preempt_rt" things in it.
> > > > - What causes the failure? I see you reworked into two parts to behave
> > > > similar to what happens without threaded interrupts. There is still no
> > > > explanation for it. Is there a timing limit or was there another
> > > > register operation which removed the mailbox message?
> > > >
> > >
> > > I explained the root cause of the issue in the last commit. Maybe I should
> > > have added the explanation to the cover letter as well. Anyway, here is a
> > > partial verbatim copy of it:
> > >
> > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > ip link up command intermittently fails for the igbvf interfaces when
> > > using the PREEMPT_RT variant. Investigation revealed that
> > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > from e1000_poll_for_ack.
> >
> > That ACK would have come if it would poll longer?
> >
> No, the service wouldn't be serviced while polling.
Hmm.
> > > The underlying issue arises from the fact that IRQs are threaded by
> > > default under PREEMPT_RT. While the exact hardware details are not
> > > available, it appears that the IRQ handled by igb_msix_other must
> > > be processed before e1000_poll_for_ack times out. However,
> > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > to a scenario where the IRQ is serviced only after the failure of
> > > e1000_write_posted_mbx."
> >
> > Where is this disabled preemption coming from? This should be one of the
> > ops.write_posted() calls, right? I've been looking around and don't see
> > anything obvious.
>
> I don't remember if I found the answer by looking at the code or by
> looking at the ftrace flags.
> I am currently on sick leave with covid. I can check it when I come back.
Don't worry, get better first. I'm kind of off myself. I'm not sure if I
have the hardware needed to setup so I can look at it…
Sebastian
Powered by blists - more mailing lists