netdev - Re: [net-next-2.6 PATCH v6 1/2] net: implement mechanism for HW based QOS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110107214645.GB2050@del.dom.local>
Date:	Fri, 7 Jan 2011 22:46:45 +0100
From:	Jarek Poplawski <jarkao2@...il.com>
To:	John Fastabend <john.r.fastabend@...el.com>
Cc:	davem@...emloft.net, hadi@...erus.ca, eric.dumazet@...il.com,
	shemminger@...tta.com, tgraf@...radead.org,
	bhutchings@...arflare.com, nhorman@...driver.com,
	netdev@...r.kernel.org
Subject: Re: [net-next-2.6 PATCH v6 1/2] net: implement mechanism for HW
 based QOS

On Thu, Jan 06, 2011 at 07:12:11PM -0800, John Fastabend wrote:
> This patch provides a mechanism for lower layer devices to
> steer traffic using skb->priority to tx queues. This allows
> for hardware based QOS schemes to use the default qdisc without
> incurring the penalties related to global state and the qdisc
> lock. While reliably receiving skbs on the correct tx ring
> to avoid head of line blocking resulting from shuffling in
> the LLD. Finally, all the goodness from txq caching and xps/rps
> can still be leveraged.
> 
> Many drivers and hardware exist with the ability to implement
> QOS schemes in the hardware but currently these drivers tend
> to rely on firmware to reroute specific traffic, a driver
> specific select_queue or the queue_mapping action in the
> qdisc.
> 
> By using select_queue for this drivers need to be updated for
> each and every traffic type and we lose the goodness of much
> of the upstream work. Firmware solutions are inherently
> inflexible. And finally if admins are expected to build a
> qdisc and filter rules to steer traffic this requires knowledge
> of how the hardware is currently configured. The number of tx
> queues and the queue offsets may change depending on resources.
> Also this approach incurs all the overhead of a qdisc with filters.
> 
> With the mechanism in this patch users can set skb priority using
> expected methods ie setsockopt() or the stack can set the priority
> directly. Then the skb will be steered to the correct tx queues
> aligned with hardware QOS traffic classes. In the normal case with
> a single traffic class and all queues in this class everything
> works as is until the LLD enables multiple tcs.
> 
> To steer the skb we mask out the lower 4 bits of the priority
> and allow the hardware to configure upto 15 distinct classes
> of traffic. This is expected to be sufficient for most applications
> at any rate it is more then the 8021Q spec designates and is
> equal to the number of prio bands currently implemented in
> the default qdisc.
> 
> This in conjunction with a userspace application such as
> lldpad can be used to implement 8021Q transmission selection
> algorithms one of these algorithms being the extended transmission
> selection algorithm currently being used for DCB.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
> ---
> 
>  include/linux/netdevice.h |   65 +++++++++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c            |   52 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 116 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0f6b1c9..12fff42 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -646,6 +646,14 @@ struct xps_dev_maps {
>      (nr_cpu_ids * sizeof(struct xps_map *)))
>  #endif /* CONFIG_XPS */
>  
> +#define TC_MAX_QUEUE	16
> +#define TC_BITMASK	15
> +/* HW offloaded queuing disciplines txq count and offset maps */
> +struct netdev_tc_txq {
> +	u16 count;
> +	u16 offset;
> +};
> +
>  /*
>   * This structure defines the management hooks for network devices.
>   * The following hooks can be defined; unless noted otherwise, they are
> @@ -756,6 +764,7 @@ struct xps_dev_maps {
>   * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
>   *			  struct nlattr *port[]);
>   * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
> + * void (*ndo_setup_tc)(struct net_device *dev, u8 tc)

..., unsigned int txq) ?

>   */
>  #define HAVE_NET_DEVICE_OPS
>  struct net_device_ops {
> @@ -814,6 +823,8 @@ struct net_device_ops {
>  						   struct nlattr *port[]);
>  	int			(*ndo_get_vf_port)(struct net_device *dev,
>  						   int vf, struct sk_buff *skb);
> +	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc,
> +						unsigned int txq);

...
> +/* netif_setup_tc - Handle tc mappings on real_num_tx_queues change
> + * @dev: Network device
> + * @txq: number of queues available
> + *
> + * If real_num_tx_queues is changed the tc mappings may no longer be
> + * valid. To resolve this if the net_device supports ndo_setup_tc
> + * call the ops routine with the new queue number. If the ops is not
> + * available verify the tc mapping remains valid and if not NULL the
> + * mapping. With no priorities mapping to this offset/count pair it
> + * will no longer be used. In the worst case TC0 is invalid nothing
> + * can be done so disable priority mappings.
> + */
> +void netif_setup_tc(struct net_device *dev, unsigned int txq)
> +{
> +	const struct net_device_ops *ops = dev->netdev_ops;
> +
> +	if (ops->ndo_setup_tc) {
> +		ops->ndo_setup_tc(dev, dev->num_tc, txq);
> +	} else {
> +		int i;
> +		struct netdev_tc_txq *tc = &dev->tc_to_txq[0];
> +
> +		/* If TC0 is invalidated disable TC mapping */
> +		if (tc->offset + tc->count > txq) {
> +			dev->num_tc = 0;
> +			return;
> +		}
> +
> +		/* Invalidated prio to tc mappings set to TC0 */
> +		for (i = 1; i < TC_BITMASK + 1; i++) {
> +			int q = netdev_get_prio_tc_map(dev, i);

(empty line)
Btw, probably some warning should be logged on config change here.

> +			tc = &dev->tc_to_txq[q];
> +
> +			if (tc->offset + tc->count > txq)
> +				netdev_set_prio_tc_map(dev, i, 0);
> +		}
> +	}
> +}
> +
>  /*
>   * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues
>   * greater then real_num_tx_queues stale skbs on the qdisc must be flushed.
> @@ -1614,6 +1653,9 @@ int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
>  
>  		if (txq < dev->real_num_tx_queues)
>  			qdisc_reset_all_tx_gt(dev, txq);
> +
> +		if (dev->num_tc)
> +			netif_setup_tc(dev, txq);

Should be before qdisc_reset_all_tx_gt (above).

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html