[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240908160702.56618-1-jdamato@fastly.com>
Date: Sun, 8 Sep 2024 16:06:34 +0000
From: Joe Damato <jdamato@...tly.com>
To: netdev@...r.kernel.org
Cc: mkarsten@...terloo.ca,
kuba@...nel.org,
skhawaja@...gle.com,
sdf@...ichev.me,
bjorn@...osinc.com,
amritha.nambiar@...el.com,
sridhar.samudrala@...el.com,
Joe Damato <jdamato@...tly.com>,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Breno Leitao <leitao@...ian.org>,
Daniel Jurgens <danielj@...dia.com>,
David Ahern <dsahern@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Donald Hunter <donald.hunter@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Jiri Pirko <jiri@...nulli.us>,
Johannes Berg <johannes.berg@...el.com>,
Jonathan Corbet <corbet@....net>,
Kory Maincent <kory.maincent@...tlin.com>,
Larysa Zaremba <larysa.zaremba@...el.com>,
Leon Romanovsky <leon@...nel.org>,
linux-doc@...r.kernel.org (open list:DOCUMENTATION),
linux-kernel@...r.kernel.org (open list),
linux-rdma@...r.kernel.org (open list:MELLANOX MLX4 core VPI driver),
Lorenzo Bianconi <lorenzo@...nel.org>,
Michael Chan <michael.chan@...adcom.com>,
Paolo Abeni <pabeni@...hat.com>,
Saeed Mahameed <saeedm@...dia.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Tariq Toukan <tariqt@...dia.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: [RFC net-next v2 0/9] Add support for per-NAPI config via netlink
Greetings:
Welcome to v2, converted to RFC.... which is definitely incorrect, but
hopefully can serve as a basis for discussion to get to something
better.
This implementation allocates "struct napi_storage" and each NAPI
instance is assigned an index into the storage array.
It seemed like this storage area should persist even if different HW
queues are created, but should be cleared when queue counts are resized
(ethtool -L).
What I did is flat out incorrect: memset the struct to 0
on napi_enable.
I am not totally clear if I understand the part of the previous
conversation about mapping rings to NAPIs and so on, but I wanted to
make sure the rest of the implementation was starting to vaguely look
like what was discussed in the previous thread.
To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
- mlx5 which is the basis of the examples below
- mlx4 which has TX only NAPIs, just to highlight that case. I have
only compile tested this patch; I don't have this hardware.
- bnxt which I have only compiled tested.
Zeroing on napi_enable is incorrect because, at the very least, it
breaks sysfs (the sysfs settings should be inherited). It's not clear
to me how we achieve persistence with the zero-ing unless we assume the
drivers are changed somehow? IIRC, there was a suggestion in the
previous thread to memset the napi_storage to 0 on queue resize, but
I've definitely gotten the nuance in what was desired wrong.
Anyway, sending what I have before iterating further to see if this is
even remotely the right direction before going too deep down this path.
I hope that's OK.
Here's an example of how it works on my mlx5 as is:
# start with 4 queues
$ ethtool -l eth4 | grep Combined | tail -1
Combined: 4
First, output the current NAPI settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Now, set the global sysfs parameters:
$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'
Output current NAPI settings again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Now set NAPI ID 927, via its ifindex and index to specific values:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set \
--json='{"ifindex": 7, "index": 2,
"defer-hard-irqs": 111,
"gro-flush-timeout": 11111}'
None
Now output current NAPI settings again to ensure only 927 changed:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 11111,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Now, increase gro-flush-timeout only:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"ifindex": 7, "index": 2,
"gro-flush-timeout": 44444}
None
Now output the current NAPI settings once more:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 44444,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Now set NAPI ID 927, via its ifindex and index, to have
gro_flush_timeout of 0:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"ifindex": 7, "index": 2,
"gro-flush-timeout": 0}'
None
Check that NAPI ID 927 has a value of 0:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:
$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 928,
'ifindex': 7,
'index': 3,
'irq': 529},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 927,
'ifindex': 7,
'index': 2,
'irq': 528},
[...]
Resizing the queues (ethtool -L) resets everything to 0, which is wrong
because it breaks sysfs and it breaks the persistence goal.
I hope though that his code can still be discussed to ensure that I am
moving in the correct direction toward solving these issues before going
too far down the rabbit hole :)
Thanks,
Joe
Joe Damato (9):
net: napi: Add napi_storage
netdev-genl: Export NAPI index
net: napi: Make napi_defer_hard_irqs per-NAPI
netdev-genl: Dump napi_defer_hard_irqs
net: napi: Make gro_flush_timeout per-NAPI
netdev-genl: Support setting per-NAPI config values
bnxt: Add support for napi storage
mlx5: Add support for napi storage
mlx4: Add support for napi storage to RX CQs
Documentation/netlink/specs/netdev.yaml | 35 ++++++++
.../networking/net_cachelines/net_device.rst | 3 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +-
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
include/linux/netdevice.h | 38 ++++++++-
include/uapi/linux/netdev.h | 4 +
net/core/dev.c | 36 +++++---
net/core/dev.h | 83 +++++++++++++++++++
net/core/net-sysfs.c | 4 +-
net/core/netdev-genl-gen.c | 15 ++++
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 65 +++++++++++++++
tools/include/uapi/linux/netdev.h | 3 +
14 files changed, 276 insertions(+), 19 deletions(-)
--
2.25.1
Powered by blists - more mailing lists