lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mtmm2bwn3lrsmsx3evzemzjvaddmzfvnk6g37yr3fmzb77bpyu@ffto5sq7nvfw>
Date: Tue, 18 Feb 2025 11:50:55 -0300
From: Wander Lairson Costa <wander@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, davem@...emloft.net, 
	kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com, andrew+netdev@...n.ch, 
	netdev@...r.kernel.org, rostedt@...dmis.org, clrkwllms@...nel.org, jgarzik@...hat.com, 
	yuma@...hat.com, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling
 for PREEMPT_RT

On Wed, Feb 12, 2025 at 04:29:25PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-12 12:21:04 [-0300], Wander Lairson Costa wrote:
> > > "eventually fails". Does this mean it passes the first few iterations
> > > but then it times out? In that case it might be something else
> > >
> > Yes. Indeed, might be due something else. I will perform further investigation
> > when I get the machine back.
> 
> Okay. Then I consider this series not going to be applied, I have an
> idea what is happening and I wait until you get back.
> 

Sorry it took so long. After a day fighting the machine I could boot an
upstream kernel on it and generate the logs.

These logs are for the test case of booting the kernel with nr_cpus=1:

     kworker/0:0-8       [000] d..2.  2120.708145: process_one_work <-worker_thread
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reset_task <-process_one_work
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reinit_locked <-process_one_work
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_down <-igbvf_reinit_locked
     kworker/0:0-8       [000] ...1.  2120.718619: igbvf_update_stats <-igbvf_down
     kworker/0:0-8       [000] ...1.  2120.718619: igbvf_reset <-igbvf_down
     kworker/0:0-8       [000] b..13  2120.718620: e1000_reset_hw_vf <-igbvf_reset
     kworker/0:0-8       [000] b..13  2120.718620: e1000_check_for_rst_vf <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2120.718621: e1000_write_posted_mbx <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2120.718621: e1000_write_mbx_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] b..13  2120.718624: e1000_check_for_ack_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] D.h.3  2120.718626: irq_handler_entry: irq=63 name=ens14f0
     kworker/0:0-8       [000] b..13  2120.719133: e1000_check_for_ack_vf <-e1000_write_posted_mbx
	[...] repeats e1000_check_for_ack_vf for 2000 lines
     kworker/0:0-8       [000] b..13  2120.719634: e1000_check_for_ack_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] b..13  2121.730639: e1000_read_posted_mbx <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2121.730643: e1000_init_hw_vf <-igbvf_reset
     kworker/0:0-8       [000] b..13  2121.730643: e1000_rar_set_vf <-e1000_init_hw_vf
     kworker/0:0-8       [000] b..13  2121.730643: e1000_write_posted_mbx <-e1000_rar_set_vf
     kworker/0:0-8       [000] D.Zf2  2121.730645: igbvf_reset_L14: (igbvf_reset+0x62/0x120 [igbvf])
     kworker/0:0-8       [000] .N...  2121.730649: igbvf_reset_L16: (igbvf_reset+0x7b/0x120 [igbvf])
  irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
  irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
  irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
  irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other

I created two custom probes inside igbvf_reset:

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko -L igbvf_reset
<igbvf_reset@...me/test/kernel-ark/drivers/net/ethernet/intel/igbvf/netdev.c:0>
      0  static void igbvf_reset(struct igbvf_adapter *adapter)
         {
      2         struct e1000_mac_info *mac = &adapter->hw.mac;
                struct net_device *netdev = adapter->netdev;
                struct e1000_hw *hw = &adapter->hw;
         
                spin_lock_bh(&hw->mbx_lock);
         
                /* Allow time for pending master requests to run */
      9         if (mac->ops.reset_hw(hw))
     10                 dev_info(&adapter->pdev->dev, "PF still resetting\n");
         
     12         mac->ops.init_hw(hw);
         
     14         spin_unlock_bh(&hw->mbx_lock);
         
     16         if (is_valid_ether_addr(adapter->hw.mac.addr)) {
     17                 eth_hw_addr_set(netdev, adapter->hw.mac.addr);
     18                 memcpy(netdev->perm_addr, adapter->hw.mac.addr,
                               netdev->addr_len);
                }
         
     22         adapter->last_reset = jiffies;
         }
         
         int igbvf_up(struct igbvf_adapter *adapter)

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko igbvf_reset:14
Added new event:
  probe:igbvf_reset_L14 (on igbvf_reset:14 in igbvf)

You can now use it in all perf tools, such as:

        perf record -e probe:igbvf_reset_L14 -aR sleep 1

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko igbvf_reset:16
Added new event:
  probe:igbvf_reset_L16 (on igbvf_reset:16 in igbvf)

They are intended to monitor the effect of spin_unlock_bh().

This is the trace-cmd command line I ran:

$ trace-cmd start -p function -l 'e1000*' -l 'igb*' -l process_one_work -e irq:irq_handler_entry -e probe
  plugin 'function'

The threaded interrupt handler is called right after (during?)
spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
field there.

I am currently working on something else that has a higher priority, so
I don't have time right now to go deeper on that. But feel free to ask
me for any test or trace you may need.

> Sebastian
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ