[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4deec35-d028-c185-bf39-6ba674f3a42e@intel.com>
Date: Mon, 6 Feb 2023 16:15:35 -0800
From: "Nambiar, Amritha" <amritha.nambiar@...el.com>
To: <netdev@...r.kernel.org>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <edumazet@...gle.com>,
<pabeni@...hat.com>, Saeed Mahameed <saeed@...nel.org>,
<alexander.duyck@...il.com>,
"Samudrala, Sridhar" <sridhar.samudrala@...el.com>
Subject: Kernel interface to configure queue-group parameters
Hello,
We are looking for feedback on the kernel interface to configure
queue-group level parameters.
Queues are primary residents in the kernel and there are multiple
interfaces to configure queue-level parameters. For example, tx_maxrate
for a transmit queue can be controlled via the sysfs interface. Ethtool
is another option to change the RX/TX ring parameters of the specified
network device (example, rx-buf-len, tx-push etc.).
Queue_groups are a set of queues grouped together into a single object.
For example, tx_queue_group-0 is a transmit queue_group with index 0 and
can have transmit queues say 0-31, similarly rx_queue_group-0 is a
receive queue_group with index 0 and can have receive queues 0-31,
tx/rx_queue_group_1 may consist of TX and RX queues say 32-127
respectively. Currently, upstream drivers for both ice and mlx5 support
creating TX and RX queue groups via the tc-mqprio and ethtool interfaces.
At this point, the kernel does not have an abstraction for queue_group.
A close equivalent in the kernel is a 'traffic class' which consists of
a set of transmit queues. Today, traffic classes are created using TC's
mqprio scheduler. Only a limited set of parameters can be configured on
each traffic class via mqprio, example priority per traffic class, min
and max bandwidth rates per traffic class etc. Mqprio also supports
offload of these parameters to the hardware. The parameters set for the
traffic class (tx queue_group) is applicable to all transmit queues
belonging to the queue_group. However, introducing additional parameters
for queue_groups and configuring them via mqprio makes the interface
less user-friendly (as the command line gets cumbersome due to the
number of qdisc parameters). Although, mqprio is the interface to create
transmit queue_groups, and is also the interface to configure and
offload certain transmit queue_group parameters, due to these
limitations we are wondering if it is worth considering other interface
options for configuring queue_group parameters.
Likewise, receive queue_groups can be created using the ethtool
interface as RSS contexts. Next step would be to configure
per-rx_queue_group parameters. Based on the discussion in
https://lore.kernel.org/netdev/20221114091559.7e24c7de@kernel.org/,
it looks like ethtool may not be the right interface to configure
rx_queue_group parameters (that are unrelated to flow<->queue
assignment), example NAPI configurations on the queue_group.
The key gaps in the kernel to support queue-group parameters are:
1. 'queue_group' abstraction in the kernel for both TX and RX distinctly
2. Offload hooks for TX/RX queue_group parameters depending on the
chosen interface.
Following are the options we have investigated:
1. tc-mqprio:
Pros:
- Already supports creating queue_groups, offload of certain parameters
Cons:
- Introducing new parameters makes the interface less user-friendly.
TC qdisc parameters are specified at the qdisc creation, larger the
number of traffic classes and their respective parameters, lesser the
usability.
2. Ethtool:
Pros:
- Already creates RX queue_groups as RSS contexts
Cons:
- May not be the right interface for non-RSS related parameters
Example for configuring number of napi pollers for a queue group:
ethtool -X <iface> context <context_num> num_pollers <n>
3. sysfs:
Pros:
- Ideal to configure parameters such as NAPI/IRQ for Rx queue_group.
- Makes it possible to support some existing per-netdev napi
parameters like 'threaded' and 'napi_defer_hard_irqs' etc. to be
per-queue-group parameters.
Cons:
- Requires introducing new queue_group structures for TX and RX
queue groups and references for it, kset references for queue_group in
struct net_device
- Additional ndo ops in net_device_ops for each parameter for
hardware offload.
Examples :
/sys/class/net/<iface>/queue_groups/rxqg-<0-n>/num_pollers
/sys/class/net/<iface>/queue_groups/txqg-<0-n>/min_rate
4. Devlink:
Pros:
- New parameters can be added without any changes to the kernel or
userspace.
Cons:
- Queue/Queue_group is a function-wide entity, Devlink is for
device-wide stuff. Devlink being device centric is not suitable for
queue parameters such as rates, NAPI etc.
Thanks,
Amritha
Powered by blists - more mailing lists