lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250109174512.At7ZERjU@linutronix.de>
Date: Thu, 9 Jan 2025 18:45:12 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Wander Lairson Costa <wander@...hat.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Clark Williams <clrkwllms@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Jeff Garzik <jgarzik@...hat.com>,
	Auke Kok <auke-jan.h.kok@...el.com>,
	"moderated list:INTEL ETHERNET DRIVERS" <intel-wired-lan@...ts.osuosl.org>,
	"open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>,
	"open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT" <linux-rt-devel@...ts.linux.dev>
Subject: Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for
 PREEMPT_RT

On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > If the issue is indeed the use of threaded interrupts then the fix
> > should not be limited to be PREEMPT_RT only.
> >
> Although I was not aware of this scenario, the patch should work for it as well,
> as I am forcing it to run in interrupt context. I will test it to confirm.

If I remember correctly there were "ifdef preempt_rt" things in it.

> > > > - What causes the failure? I see you reworked into two parts to behave
> > > >   similar to what happens without threaded interrupts. There is still no
> > > >   explanation for it. Is there a timing limit or was there another
> > > >   register operation which removed the mailbox message?
> > > >
> > >
> > > I explained the root cause of the issue in the last commit. Maybe I should
> > > have added the explanation to the cover letter as well.  Anyway, here is a
> > > partial verbatim copy of it:
> > >
> > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > ip link up command intermittently fails for the igbvf interfaces when
> > > using the PREEMPT_RT variant. Investigation revealed that
> > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > from e1000_poll_for_ack.
> >
> > That ACK would have come if it would poll longer?
> >
> No, the service wouldn't be serviced while polling.

Hmm. 

> > > The underlying issue arises from the fact that IRQs are threaded by
> > > default under PREEMPT_RT. While the exact hardware details are not
> > > available, it appears that the IRQ handled by igb_msix_other must
> > > be processed before e1000_poll_for_ack times out. However,
> > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > to a scenario where the IRQ is serviced only after the failure of
> > > e1000_write_posted_mbx."
> >
> > Where is this disabled preemption coming from? This should be one of the
> > ops.write_posted() calls, right? I've been looking around and don't see
> > anything obvious.
> 
> I don't remember if I found the answer by looking at the code or by
> looking at the ftrace flags.
> I am currently on sick leave with covid. I can check it when I come back.

Don't worry, get better first. I'm kind of off myself. I'm not sure if I
have the hardware needed to setup so I can look at it…

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ