[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f02044c0-1d90-49f8-8a2d-00ec84fba27a@intel.com>
Date: Tue, 29 Oct 2024 11:49:03 +0200
From: "Lifshits, Vitaly" <vitaly.lifshits@...el.com>
To: Joe Damato <jdamato@...tly.com>, <netdev@...r.kernel.org>
CC: <jacob.e.keller@...el.com>, <kurt@...utronix.de>,
<vinicius.gomes@...el.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
"Przemek Kitszel" <przemyslaw.kitszel@...el.com>, Andrew Lunn
<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, "Alexei Starovoitov" <ast@...nel.org>, Daniel Borkmann
<daniel@...earbox.net>, "Jesper Dangaard Brouer" <hawk@...nel.org>, John
Fastabend <john.fastabend@...il.com>, "moderated list:INTEL ETHERNET DRIVERS"
<intel-wired-lan@...ts.osuosl.org>, open list <linux-kernel@...r.kernel.org>,
"open list:XDP (eXpress Data Path)" <bpf@...r.kernel.org>
Subject: Re: [PATCH iwl-next v5 2/2] igc: Link queues to NAPI instances
On 10/28/2024 9:52 PM, Joe Damato wrote:
> Link queues to NAPI instances via netdev-genl API so that users can
> query this information with netlink. Handle a few cases in the driver:
> 1. Link/unlink the NAPIs when XDP is enabled/disabled
> 2. Handle IGC_FLAG_QUEUE_PAIRS enabled and disabled
>
> Example output when IGC_FLAG_QUEUE_PAIRS is enabled:
>
> $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> --dump queue-get --json='{"ifindex": 2}'
>
> [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
> {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
> {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
> {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
> {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'},
> {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
> {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
>
> Since IGC_FLAG_QUEUE_PAIRS is enabled, you'll note that the same NAPI ID
> is present for both rx and tx queues at the same index, for example
> index 0:
>
> {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'},
>
> To test IGC_FLAG_QUEUE_PAIRS disabled, a test system was booted using
> the grub command line option "maxcpus=2" to force
> igc_set_interrupt_capability to disable IGC_FLAG_QUEUE_PAIRS.
>
> Example output when IGC_FLAG_QUEUE_PAIRS is disabled:
>
> $ lscpu | grep "On-line CPU"
> On-line CPU(s) list: 0,2
>
> $ ethtool -l enp86s0 | tail -5
> Current hardware settings:
> RX: n/a
> TX: n/a
> Other: 1
> Combined: 2
>
> $ cat /proc/interrupts | grep enp
> 144: [...] enp86s0
> 145: [...] enp86s0-rx-0
> 146: [...] enp86s0-rx-1
> 147: [...] enp86s0-tx-0
> 148: [...] enp86s0-tx-1
>
> 1 "other" IRQ, and 2 IRQs for each of RX and Tx, so we expect netlink to
> report 4 IRQs with unique NAPI IDs:
>
> $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> --dump napi-get --json='{"ifindex": 2}'
> [{'id': 8196, 'ifindex': 2, 'irq': 148},
> {'id': 8195, 'ifindex': 2, 'irq': 147},
> {'id': 8194, 'ifindex': 2, 'irq': 146},
> {'id': 8193, 'ifindex': 2, 'irq': 145}]
>
> Now we examine which queues these NAPIs are associated with, expecting
> that since IGC_FLAG_QUEUE_PAIRS is disabled each RX and TX queue will
> have its own NAPI instance:
>
> $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
> --dump queue-get --json='{"ifindex": 2}'
> [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
> {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
> {'id': 0, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'},
> {'id': 1, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}]
>
> Signed-off-by: Joe Damato <jdamato@...tly.com>
> ---
> v5:
> - Rename igc_resume to __igc_do_resume and pass in a boolean
> "need_rtnl" to signal whether or not rtnl should be held before
> caling __igc_open. Call this new function from igc_runtime_resume
> and igc_resume passing in false (for igc_runtime_resume) and true
> (igc_resume), respectively. This is done to avoid reintroducing a
> bug fixed in commit: 6f31d6b: "igc: Refactor runtime power
> management flow" where rtnl is held in runtime_resume causing a
> deadlock.
>
> v4:
> - Add rtnl_lock/rtnl_unlock in two paths: igc_resume and
> igc_io_error_detected. The code added to the latter is inspired by
> a similar implementation in ixgbe's ixgbe_io_error_detected.
>
> v3:
> - Replace igc_unset_queue_napi with igc_set_queue_napi(adapater, i,
> NULL), as suggested by Vinicius Costa Gomes
> - Simplify implemention of igc_set_queue_napi as suggested by Kurt
> Kanzenbach, with a tweak to use ring->queue_index
>
> v2:
> - Update commit message to include tests for IGC_FLAG_QUEUE_PAIRS
> disabled
> - Refactored code to move napi queue mapping and unmapping to helper
> functions igc_set_queue_napi and igc_unset_queue_napi
> - Adjust the code to handle IGC_FLAG_QUEUE_PAIRS disabled
> - Call helpers to map/unmap queues to NAPIs in igc_up, __igc_open,
> igc_xdp_enable_pool, and igc_xdp_disable_pool
>
> drivers/net/ethernet/intel/igc/igc.h | 2 +
> drivers/net/ethernet/intel/igc/igc_main.c | 52 ++++++++++++++++++++---
> drivers/net/ethernet/intel/igc/igc_xdp.c | 2 +
> 3 files changed, 49 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
> index eac0f966e0e4..b8111ad9a9a8 100644
> --- a/drivers/net/ethernet/intel/igc/igc.h
> +++ b/drivers/net/ethernet/intel/igc/igc.h
> @@ -337,6 +337,8 @@ struct igc_adapter {
> struct igc_led_classdev *leds;
> };
>
> +void igc_set_queue_napi(struct igc_adapter *adapter, int q_idx,
> + struct napi_struct *napi);
> void igc_up(struct igc_adapter *adapter);
> void igc_down(struct igc_adapter *adapter);
> int igc_open(struct net_device *netdev);
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
> index 7964bbedb16c..051a0cdb1143 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -4948,6 +4948,22 @@ static int igc_sw_init(struct igc_adapter *adapter)
> return 0;
> }
>
> +void igc_set_queue_napi(struct igc_adapter *adapter, int vector,
> + struct napi_struct *napi)
> +{
> + struct igc_q_vector *q_vector = adapter->q_vector[vector];
> +
> + if (q_vector->rx.ring)
> + netif_queue_set_napi(adapter->netdev,
> + q_vector->rx.ring->queue_index,
> + NETDEV_QUEUE_TYPE_RX, napi);
> +
> + if (q_vector->tx.ring)
> + netif_queue_set_napi(adapter->netdev,
> + q_vector->tx.ring->queue_index,
> + NETDEV_QUEUE_TYPE_TX, napi);
> +}
> +
> /**
> * igc_up - Open the interface and prepare it to handle traffic
> * @adapter: board private structure
> @@ -4955,6 +4971,7 @@ static int igc_sw_init(struct igc_adapter *adapter)
> void igc_up(struct igc_adapter *adapter)
> {
> struct igc_hw *hw = &adapter->hw;
> + struct napi_struct *napi;
> int i = 0;
>
> /* hardware has been reset, we need to reload some things */
> @@ -4962,8 +4979,11 @@ void igc_up(struct igc_adapter *adapter)
>
> clear_bit(__IGC_DOWN, &adapter->state);
>
> - for (i = 0; i < adapter->num_q_vectors; i++)
> - napi_enable(&adapter->q_vector[i]->napi);
> + for (i = 0; i < adapter->num_q_vectors; i++) {
> + napi = &adapter->q_vector[i]->napi;
> + napi_enable(napi);
> + igc_set_queue_napi(adapter, i, napi);
> + }
>
> if (adapter->msix_entries)
> igc_configure_msix(adapter);
> @@ -5192,6 +5212,7 @@ void igc_down(struct igc_adapter *adapter)
> for (i = 0; i < adapter->num_q_vectors; i++) {
> if (adapter->q_vector[i]) {
> napi_synchronize(&adapter->q_vector[i]->napi);
> + igc_set_queue_napi(adapter, i, NULL);
> napi_disable(&adapter->q_vector[i]->napi);
> }
> }
> @@ -6021,6 +6042,7 @@ static int __igc_open(struct net_device *netdev, bool resuming)
> struct igc_adapter *adapter = netdev_priv(netdev);
> struct pci_dev *pdev = adapter->pdev;
> struct igc_hw *hw = &adapter->hw;
> + struct napi_struct *napi;
> int err = 0;
> int i = 0;
>
> @@ -6056,8 +6078,11 @@ static int __igc_open(struct net_device *netdev, bool resuming)
>
> clear_bit(__IGC_DOWN, &adapter->state);
>
> - for (i = 0; i < adapter->num_q_vectors; i++)
> - napi_enable(&adapter->q_vector[i]->napi);
> + for (i = 0; i < adapter->num_q_vectors; i++) {
> + napi = &adapter->q_vector[i]->napi;
> + napi_enable(napi);
> + igc_set_queue_napi(adapter, i, napi);
> + }
>
> /* Clear any pending interrupts. */
> rd32(IGC_ICR);
> @@ -7342,7 +7367,7 @@ static void igc_deliver_wake_packet(struct net_device *netdev)
> netif_rx(skb);
> }
>
> -static int igc_resume(struct device *dev)
> +static int __igc_do_resume(struct device *dev, bool need_rtnl)
> {
> struct pci_dev *pdev = to_pci_dev(dev);
> struct net_device *netdev = pci_get_drvdata(pdev);
> @@ -7385,7 +7410,11 @@ static int igc_resume(struct device *dev)
> wr32(IGC_WUS, ~0);
>
> if (netif_running(netdev)) {
> + if (need_rtnl)
> + rtnl_lock();
> err = __igc_open(netdev, true);
> + if (need_rtnl)
> + rtnl_unlock();
> if (!err)
> netif_device_attach(netdev);
> }
> @@ -7393,9 +7422,14 @@ static int igc_resume(struct device *dev)
> return err;
> }
>
> +static int igc_resume(struct device *dev)
> +{
> + return __igc_do_resume(dev, true);
> +}
> +
> static int igc_runtime_resume(struct device *dev)
> {
> - return igc_resume(dev);
> + return __igc_do_resume(dev, false);
> }
>
> static int igc_suspend(struct device *dev)
> @@ -7440,14 +7474,18 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
> struct net_device *netdev = pci_get_drvdata(pdev);
> struct igc_adapter *adapter = netdev_priv(netdev);
>
> + rtnl_lock();
> netif_device_detach(netdev);
>
> - if (state == pci_channel_io_perm_failure)
> + if (state == pci_channel_io_perm_failure) {
> + rtnl_unlock();
> return PCI_ERS_RESULT_DISCONNECT;
> + }
>
> if (netif_running(netdev))
> igc_down(adapter);
> pci_disable_device(pdev);
> + rtnl_unlock();
>
> /* Request a slot reset. */
> return PCI_ERS_RESULT_NEED_RESET;
> diff --git a/drivers/net/ethernet/intel/igc/igc_xdp.c b/drivers/net/ethernet/intel/igc/igc_xdp.c
> index e27af72aada8..4da633430b80 100644
> --- a/drivers/net/ethernet/intel/igc/igc_xdp.c
> +++ b/drivers/net/ethernet/intel/igc/igc_xdp.c
> @@ -84,6 +84,7 @@ static int igc_xdp_enable_pool(struct igc_adapter *adapter,
> napi_disable(napi);
> }
>
> + igc_set_queue_napi(adapter, queue_id, NULL);
> set_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
> set_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
>
> @@ -133,6 +134,7 @@ static int igc_xdp_disable_pool(struct igc_adapter *adapter, u16 queue_id)
> xsk_pool_dma_unmap(pool, IGC_RX_DMA_ATTR);
> clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
> clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
> + igc_set_queue_napi(adapter, queue_id, napi);
>
> if (needs_reset) {
> napi_enable(napi);
I believe that this fix should work on most cases. I have some concerns
that this solution might not be 100% robust as sometimes runtime resume
may be triggered without the rtnl being held. For example, if it is
initiated by a network wake event. But, for the moment I think that this
appoach is good enough.
My main comment here is the naming conventions, I prefer using the
original parameters/function names for consistency, similarly to what
was done in the igb driver:
https://github.com/torvalds/linux/commit/ac8c58f5b535d6272324e2b8b4a0454781c9147e
Powered by blists - more mailing lists