netdev - Re: [PATCH net-next] net: allocate skbs on local node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Oct 2010 22:35:01 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Michael Chan <mchan@...adcom.com>,
	Eilon Greenstein <eilong@...adcom.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@....de>,
	Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [PATCH net-next] net: allocate skbs on local node

Acked-by: Tom Herbert <therbert@...gle.com>

On Mon, Oct 11, 2010 at 10:05 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le mardi 12 octobre 2010 à 01:22 +0200, Eric Dumazet a écrit :
>> Le mardi 12 octobre 2010 à 01:03 +0200, Eric Dumazet a écrit :
>> >
>> > For multi queue devices, it makes more sense to allocate skb on local
>> > node of the cpu handling RX interrupts. This allow each cpu to
>> > manipulate its own slub/slab queues/structures without doing expensive
>> > cross-node business.
>> >
>> > For non multi queue devices, IRQ affinity should be set so that a cpu
>> > close to the device services interrupts. Even if not set, using
>> > dev_alloc_skb() is faster.
>> >
>> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
>>
>> Or maybe revert :
>>
>> commit b30973f877fea1a3fb84e05599890fcc082a88e5
>> Author: Christoph Hellwig <hch@....de>
>> Date:   Wed Dec 6 20:32:36 2006 -0800
>>
>>     [PATCH] node-aware skb allocation
>>
>>     Node-aware allocation of skbs for the receive path.
>>
>>     Details:
>>
>>       - __alloc_skb gets a new node argument and cals the node-aware
>>         slab functions with it.
>>       - netdev_alloc_skb passed the node number it gets from dev_to_node
>>         to it, everyone else passes -1 (any node)
>>
>>     Signed-off-by: Christoph Hellwig <hch@....de>
>>     Cc: Christoph Lameter <clameter@...r.sgi.com>
>>     Cc: "David S. Miller" <davem@...emloft.net>
>>     Signed-off-by: Andrew Morton <akpm@...l.org>
>>
>>
>> Apparently, only Christoph and Andrew signed it.
>>
>>
>
> [PATCH net-next] net: allocate skbs on local node
>
> commit b30973f877 (node-aware skb allocation) spread a wrong habit of
> allocating net drivers skbs on a given memory node : The one closest to
> the NIC hardware. This is wrong because as soon as we try to scale
> network stack, we need to use many cpus to handle traffic and hit
> slub/slab management on cross-node allocations/frees when these cpus
> have to alloc/free skbs bound to a central node.
>
> skb allocated in RX path are ephemeral, they have a very short
> lifetime : Extra cost to maintain NUMA affinity is too expensive. What
> appeared as a nice idea four years ago is in fact a bad one.
>
> In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
> and two 10Gb NIC might deliver more than 28 million packets per second,
> needing all the available cpus.
>
> Cost of cross-node handling in network and vm stacks outperforms the
> small benefit hardware had when doing its DMA transfert in its 'local'
> memory node at RX time. Even trying to differentiate the two allocations
> done for one skb (the sk_buff on local node, the data part on NIC
> hardware node) is not enough to bring good performance.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> ---
>  include/linux/skbuff.h |   20 ++++++++++++++++----
>  net/core/skbuff.c      |   13 +------------
>  2 files changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0b53c43..05a358f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -496,13 +496,13 @@ extern struct sk_buff *__alloc_skb(unsigned int size,
>  static inline struct sk_buff *alloc_skb(unsigned int size,
>                                        gfp_t priority)
>  {
> -       return __alloc_skb(size, priority, 0, -1);
> +       return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
>  }
>
>  static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
>                                               gfp_t priority)
>  {
> -       return __alloc_skb(size, priority, 1, -1);
> +       return __alloc_skb(size, priority, 1, NUMA_NO_NODE);
>  }
>
>  extern bool skb_recycle_check(struct sk_buff *skb, int skb_size);
> @@ -1563,13 +1563,25 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
>        return skb;
>  }
>
> -extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask);
> +/**
> + *     __netdev_alloc_page - allocate a page for ps-rx on a specific device
> + *     @dev: network device to receive on
> + *     @gfp_mask: alloc_pages_node mask
> + *
> + *     Allocate a new page. dev currently unused.
> + *
> + *     %NULL is returned if there is no free memory.
> + */
> +static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> +{
> +       return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> +}
>
>  /**
>  *     netdev_alloc_page - allocate a page for ps-rx on a specific device
>  *     @dev: network device to receive on
>  *
> - *     Allocate a new page node local to the specified device.
> + *     Allocate a new page. dev currently unused.
>  *
>  *     %NULL is returned if there is no free memory.
>  */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 752c197..4e8b82e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -247,10 +247,9 @@ EXPORT_SYMBOL(__alloc_skb);
>  struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
>                unsigned int length, gfp_t gfp_mask)
>  {
> -       int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
>        struct sk_buff *skb;
>
> -       skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
> +       skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, NUMA_NO_NODE);
>        if (likely(skb)) {
>                skb_reserve(skb, NET_SKB_PAD);
>                skb->dev = dev;
> @@ -259,16 +258,6 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
>  }
>  EXPORT_SYMBOL(__netdev_alloc_skb);
>
> -struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> -{
> -       int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
> -       struct page *page;
> -
> -       page = alloc_pages_node(node, gfp_mask, 0);
> -       return page;
> -}
> -EXPORT_SYMBOL(__netdev_alloc_page);
> -
>  void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
>                int size)
>  {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html