lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250212151108.jI8qODdD@linutronix.de>
Date: Wed, 12 Feb 2025 16:11:08 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Wander Lairson Costa <wander@...hat.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, davem@...emloft.net,
	kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com,
	andrew+netdev@...n.ch, netdev@...r.kernel.org, rostedt@...dmis.org,
	clrkwllms@...nel.org, jgarzik@...hat.com, yuma@...hat.com,
	linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling
 for PREEMPT_RT

On 2025-02-12 08:56:47 [-0300], Wander Lairson Costa wrote:
> > What disables preemption? On PREEMPT_RT the spin_lock() does not disable
> > preemption. You shouldn't spin that long. When was interrupt scheduled.
> > _Why_ is the interrupt delayed that long.
> > 
> When I was using trace-cmd report -l, it omitted some fields, one of
> them is preempt-lazy-depth (which was something new to me), and it seems
> this is what affects interrupts. It comes from here [1]. I had the logs,
> but the machine went under maintenance  before I could save them. Once
> it comes back, I can grab them and post here.
> 
> [1] https://elixir.bootlin.com/linux/v6.13.2/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522

If you do send patches against mainline please test against mainline. As
of today we have preempt-disable and migrate-disable depth. We don't
do lazy-depth anymore, we just have a bit now (which is [lLbB]).
The referenced line will only disable migration, not preemption.

It is important to understand what exactly is going on.

> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index d368b753a4675..6fe37b8001c36 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
> >  	struct net_device *netdev = adapter->netdev;
> >  	int i, err = 0, vector = 0, free_vector = 0;
> >  
> > -	err = request_irq(adapter->msix_entries[vector].vector,
> > +	err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
> >  			  igb_msix_other, 0, netdev->name, adapter);
> >  	if (err)
> >  		goto err_out;
> > 
> > just to see if it solves the problem?
> > 
> I have two test cases:
> 
> 1) Boot the machine with nr_cpus=1. The driver reports "PF still
> resetting" message continuously. This issue is gone.

good.

> 2) Run the following script:
> 
>     ipaddr_vlan=3
>     nic_test=ens14f0
>     vf=${nic_test}v0 # The main testing steps:
>     while true; do
>         ip link set ${nic_test} mtu 1500
>         ip link set ${vf} mtu 1500
>         ip link set $vf up
>         # 3. set vlan and ip for VF
>         ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
>         ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
>         ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
>         # 4. check the link state for VF and PF
>         ip link show ${nic_test}
>         if ! ip link show $vf | grep 'state UP'; then
>             echo 'Error found'
>             break
>         fi
>         ip link set $vf down
>     done
> 
> This one eventually fails. It is the first time that one works and the
> other fails. So far, it has been all or nothing. I didn't have time yet to
> investigate why this happens.

"eventually fails". Does this mean it passes the first few iterations
but then it times out? In that case it might be something else

I managed to find a "Intel Corporation I350 Gigabit Network Connection
(rev 01)" and I end up in a warning if I start the script (without the
while true):

|8021q: adding VLAN 0 to HW filter on device eno0v0
|igbvf 0000:08:10.0: Vlan id 0 is not added
|igb 0000:07:00.0: Setting VLAN 3, QOS 0x0 on VF 0
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|------------[ cut here ]------------
|WARNING: CPU: 25 PID: 3013 at drivers/net/ethernet/intel/igbvf/netdev.c:1777 igbvf_close+0x111/0x120
…
|CPU: 25 UID: 0 PID: 3013 Comm: ip Not tainted 6.14.0-rc1-rt1+ #186 PREEMPT_RT+LAZY 39474a76e7562bb76173f4b98cf194301d39bf7f
|igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
|---[ end trace 0000000000000000 ]---
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|8021q: adding VLAN 0 to HW filter on device eno0v0
|igb 0000:07:00.0: VF 0 attempted to override administratively set VLAN tag
|Reload the VF driver to resume operations
|igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter

and the state is down ('Error found' is printed). But if I do it
manually, line by line, then it all passes without the warning and the
state of the VF device is up.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ