lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zx-1BhZlXRQCImex@LQ3V64L9R2>
Date: Mon, 28 Oct 2024 09:00:06 -0700
From: Joe Damato <jdamato@...tly.com>
To: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Keller, Jacob E" <jacob.e.keller@...el.com>,
	"kurt@...utronix.de" <kurt@...utronix.de>,
	"Gomes, Vinicius" <vinicius.gomes@...el.com>,
	"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
	"Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Jesper Dangaard Brouer <hawk@...nel.org>,
	John Fastabend <john.fastabend@...il.com>,
	"moderated list:INTEL ETHERNET DRIVERS" <intel-wired-lan@...ts.osuosl.org>,
	open list <linux-kernel@...r.kernel.org>,
	"open list:XDP (eXpress Data Path)" <bpf@...r.kernel.org>,
	stanislaw.gruszka@...ux.intel.com
Subject: Re: [Intel-wired-lan] [iwl-next v4 2/2] igc: Link queues to NAPI
 instances

On Mon, Oct 28, 2024 at 08:50:38AM -0700, Joe Damato wrote:
> On Sun, Oct 27, 2024 at 11:49:33AM +0200, Lifshits, Vitaly wrote:
> > 
> > On 10/23/2024 12:52 AM, Joe Damato wrote:
> > > Link queues to NAPI instances via netdev-genl API so that users can
> > > query this information with netlink. Handle a few cases in the driver:
> > >    1. Link/unlink the NAPIs when XDP is enabled/disabled
> > >    2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled
> > > 
> > > Example output when IGC_FLAG_QUEUE_PAIRS is enabled:
> > > 
> > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> > >                           --dump queue-get --json='{"ifindex": 2}'
> > > 
> > > [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> > >   {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
> > >   {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
> > >   {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
> > >   {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
> > >   {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'},
> > >   {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
> > >   {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
> > > 
> > > Since IGC_FLAG_QUEUE_PAIRS is enabled, you'll note that the same NAPI ID
> > > is present for both rx and tx queues at the same index, for example
> > > index 0:
> > > 
> > > {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> > > {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
> > > 
> > > To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using
> > > the grub command line option "maxcpus=2" to force
> > > igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS.
> > > 
> > > Example output when IGC_FLAG_QUEUE_PAIRS is disabled:
> > > 
> > > $ lscpu | grep "On-line CPU"
> > > On-line CPU(s) list:      0,2
> > > 
> > > $ ethtool -l enp86s0  | tail -5
> > > Current hardware settings:
> > > RX:		n/a
> > > TX:		n/a
> > > Other:		1
> > > Combined:	2
> > > 
> > > $ cat /proc/interrupts  | grep enp
> > >   144: [...] enp86s0
> > >   145: [...] enp86s0-rx-0
> > >   146: [...] enp86s0-rx-1
> > >   147: [...] enp86s0-tx-0
> > >   148: [...] enp86s0-tx-1
> > > 
> > > 1 "other" IRQ, and 2 IRQs for each of RX and Tx, so we expect netlink to
> > > report 4 IRQs with unique NAPI IDs:
> > > 
> > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> > >                           --dump napi-get --json='{"ifindex": 2}'
> > > [{'id': 8196, 'ifindex': 2, 'irq': 148},
> > >   {'id': 8195, 'ifindex': 2, 'irq': 147},
> > >   {'id': 8194, 'ifindex': 2, 'irq': 146},
> > >   {'id': 8193, 'ifindex': 2, 'irq': 145}]
> > > 
> > > Now we examine which queues these NAPIs are associated with, expecting
> > > that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will
> > > have its own NAPI instance:
> > > 
> > > $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> > >                           --dump queue-get --json='{"ifindex": 2}'
> > > [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> > >   {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
> > >   {'id': 0, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
> > >   {'id': 1, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
> > > 
> > > Signed-off-by: Joe Damato <jdamato@...tly.com>
> > > Acked-by: Vinicius Costa Gomes <vinicius.gomes@...el.com>
> > > ---
> > >   v4:
> > >     - Add rtnl_lock/rtnl_unlock in two paths: igc_resume and
> > >       igc_io_error_detected. The code added to the latter is inspired by
> > >       a similar implementation in ixgbe's ixgbe_io_error_detected.
> > > 
> > >   v3:
> > >     - Replace igc_unset_queue_napi with igc_set_queue_napi(adapater, i,
> > >       NULL), as suggested by Vinicius Costa Gomes
> > >     - Simplify implemention of igc_set_queue_napi as suggested by Kurt
> > >       Kanzenbach, with a tweak to use ring->queue_index
> > > 
> > >   v2:
> > >     - Update commit message to include tests for IGC_FLAG_QUEUE_PAIRS
> > >       disabled
> > >     - Refactored code to move napi queue mapping and unmapping to helper
> > >       functions igc_set_queue_napi and igc_unset_queue_napi
> > >     - Adjust the code to handle IGC_FLAG_QUEUE_PAIRS disabled
> > >     - Call helpers to map/unmap queues to NAPIs in igc_up, __igc_open,
> > >       igc_xdp_enable_pool, and igc_xdp_disable_pool
> > > 
> > >   drivers/net/ethernet/intel/igc/igc.h      |  2 ++
> > >   drivers/net/ethernet/intel/igc/igc_main.c | 41 ++++++++++++++++++++---
> > >   drivers/net/ethernet/intel/igc/igc_xdp.c  |  2 ++
> > >   3 files changed, 40 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> > > index eac0f966e0e4..b8111ad9a9a8 100644
> > > --- a/drivers/net/ethernet/intel/igc/igc.h
> > > +++ b/drivers/net/ethernet/intel/igc/igc.h
> > > @@ -337,6 +337,8 @@ struct igc_adapter {
> > >   	struct igc_led_classdev *leds;
> > >   };
> > > +void igc_set_queue_napi(struct igc_adapter *adapter, int q_idx,
> > > +			struct napi_struct *napi);
> > >   void igc_up(struct igc_adapter *adapter);
> > >   void igc_down(struct igc_adapter *adapter);
> > >   int igc_open(struct net_device *netdev);
> > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> > > index 7964bbedb16c..04aa216ef612 100644
> > > --- a/drivers/net/ethernet/intel/igc/igc_main.c
> > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> > > @@ -4948,6 +4948,22 @@ static int igc_sw_init(struct igc_adapter *adapter)
> > >   	return 0;
> > >   }
> > > +void igc_set_queue_napi(struct igc_adapter *adapter, int vector,
> > > +			struct napi_struct *napi)
> > > +{
> > > +	struct igc_q_vector *q_vector = adapter->q_vector[vector];
> > > +
> > > +	if (q_vector->rx.ring)
> > > +		netif_queue_set_napi(adapter->netdev,
> > > +				     q_vector->rx.ring->queue_index,
> > > +				     NETDEV_QUEUE_TYPE_RX, napi);
> > > +
> > > +	if (q_vector->tx.ring)
> > > +		netif_queue_set_napi(adapter->netdev,
> > > +				     q_vector->tx.ring->queue_index,
> > > +				     NETDEV_QUEUE_TYPE_TX, napi);
> > > +}
> > > +
> > >   /**
> > >    * igc_up - Open the interface and prepare it to handle traffic
> > >    * @adapter: board private structure
> > > @@ -4955,6 +4971,7 @@ static int igc_sw_init(struct igc_adapter *adapter)
> > >   void igc_up(struct igc_adapter *adapter)
> > >   {
> > >   	struct igc_hw *hw = &adapter->hw;
> > > +	struct napi_struct *napi;
> > >   	int i = 0;
> > >   	/* hardware has been reset, we need to reload some things */
> > > @@ -4962,8 +4979,11 @@ void igc_up(struct igc_adapter *adapter)
> > >   	clear_bit(__IGC_DOWN, &adapter->state);
> > > -	for (i = 0; i < adapter->num_q_vectors; i++)
> > > -		napi_enable(&adapter->q_vector[i]->napi);
> > > +	for (i = 0; i < adapter->num_q_vectors; i++) {
> > > +		napi = &adapter->q_vector[i]->napi;
> > > +		napi_enable(napi);
> > > +		igc_set_queue_napi(adapter, i, napi);
> > > +	}
> > >   	if (adapter->msix_entries)
> > >   		igc_configure_msix(adapter);
> > > @@ -5192,6 +5212,7 @@ void igc_down(struct igc_adapter *adapter)
> > >   	for (i = 0; i < adapter->num_q_vectors; i++) {
> > >   		if (adapter->q_vector[i]) {
> > >   			napi_synchronize(&adapter->q_vector[i]->napi);
> > > +			igc_set_queue_napi(adapter, i, NULL);
> > >   			napi_disable(&adapter->q_vector[i]->napi);
> > >   		}
> > >   	}
> > > @@ -6021,6 +6042,7 @@ static int __igc_open(struct net_device *netdev, bool resuming)
> > >   	struct igc_adapter *adapter = netdev_priv(netdev);
> > >   	struct pci_dev *pdev = adapter->pdev;
> > >   	struct igc_hw *hw = &adapter->hw;
> > > +	struct napi_struct *napi;
> > >   	int err = 0;
> > >   	int i = 0;
> > > @@ -6056,8 +6078,11 @@ static int __igc_open(struct net_device *netdev, bool resuming)
> > >   	clear_bit(__IGC_DOWN, &adapter->state);
> > > -	for (i = 0; i < adapter->num_q_vectors; i++)
> > > -		napi_enable(&adapter->q_vector[i]->napi);
> > > +	for (i = 0; i < adapter->num_q_vectors; i++) {
> > > +		napi = &adapter->q_vector[i]->napi;
> > > +		napi_enable(napi);
> > > +		igc_set_queue_napi(adapter, i, napi);
> > > +	}
> > >   	/* Clear any pending interrupts. */
> > >   	rd32(IGC_ICR);
> > > @@ -7385,7 +7410,9 @@ static int igc_resume(struct device *dev)
> > >   	wr32(IGC_WUS, ~0);
> > >   	if (netif_running(netdev)) {
> > > +		rtnl_lock();
> > 
> > This change will bring back the deadlock issue that was fixed in commit:
> > 6f31d6b: "igc: Refactor runtime power management flow".
> 
> OK, thanks for letting me know.
> 
> I think I better understand what the issue is. It seems that:
> 
> - igc_resume can be called with rtnl held via ethtool (which I
>   didn't know), which calls __igc_open
> - __igc_open re-enables NAPIs and re-links queues to NAPI IDs (which
>   requires rtnl)
> 
> so, it seems like the rtnl_lock() I've added to igc_resume is
> unnecessary.
> 
> I suppose I don't know all of the paths where the pm functions can
> be called -- are there others where RTNL is _not_ already held?
> 
> I looked at e1000e and it seems that driver does not re-enable NAPIs
> in its resume path and thus does not suffer from the same issue as
> igc.
> 
> So my questions are:
> 
>   1. Are there are other contexts where igc_resume is called where
>      RTNL is not held?
> 
>   2. If the answer is that RTNL is always held when igc_resume is
>      called, then I can send a v5 that removes the
>      rtnl_lock/rtnl_unlock. What do you think?

I see, so it looks like there is:
   - resume
   - runtime_resume

The bug I am reintroducing is runtime_resume already holding RTNL
before my added call to rtnl_lock.

OK.

Does resume also hold rtnl before the driver's igc_resume is called?
I am asking because I don't know much about how PM works.

If resume does not hold RTNL (but runtime resume does, as the bug
you pointed out shows), it seems like a wrapper can be added to tell
the code whether rtnl should be held or not based on which resume is
happening.

Does anyone know if: resume (not runtime_resume) already holds RTNL?
I'll try to take a look and see, but I am not very familiar with PM.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ