[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTinxDYEd=WkSyZpU7f5kye+tbC08XC2zngAjU_Pn@mail.gmail.com>
Date: Mon, 11 Oct 2010 22:35:01 -0700
From: Tom Herbert <therbert@...gle.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Michael Chan <mchan@...adcom.com>,
Eilon Greenstein <eilong@...adcom.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Hellwig <hch@....de>,
Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [PATCH net-next] net: allocate skbs on local node
Acked-by: Tom Herbert <therbert@...gle.com>
On Mon, Oct 11, 2010 at 10:05 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le mardi 12 octobre 2010 à 01:22 +0200, Eric Dumazet a écrit :
>> Le mardi 12 octobre 2010 à 01:03 +0200, Eric Dumazet a écrit :
>> >
>> > For multi queue devices, it makes more sense to allocate skb on local
>> > node of the cpu handling RX interrupts. This allow each cpu to
>> > manipulate its own slub/slab queues/structures without doing expensive
>> > cross-node business.
>> >
>> > For non multi queue devices, IRQ affinity should be set so that a cpu
>> > close to the device services interrupts. Even if not set, using
>> > dev_alloc_skb() is faster.
>> >
>> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
>>
>> Or maybe revert :
>>
>> commit b30973f877fea1a3fb84e05599890fcc082a88e5
>> Author: Christoph Hellwig <hch@....de>
>> Date: Wed Dec 6 20:32:36 2006 -0800
>>
>> [PATCH] node-aware skb allocation
>>
>> Node-aware allocation of skbs for the receive path.
>>
>> Details:
>>
>> - __alloc_skb gets a new node argument and cals the node-aware
>> slab functions with it.
>> - netdev_alloc_skb passed the node number it gets from dev_to_node
>> to it, everyone else passes -1 (any node)
>>
>> Signed-off-by: Christoph Hellwig <hch@....de>
>> Cc: Christoph Lameter <clameter@...r.sgi.com>
>> Cc: "David S. Miller" <davem@...emloft.net>
>> Signed-off-by: Andrew Morton <akpm@...l.org>
>>
>>
>> Apparently, only Christoph and Andrew signed it.
>>
>>
>
> [PATCH net-next] net: allocate skbs on local node
>
> commit b30973f877 (node-aware skb allocation) spread a wrong habit of
> allocating net drivers skbs on a given memory node : The one closest to
> the NIC hardware. This is wrong because as soon as we try to scale
> network stack, we need to use many cpus to handle traffic and hit
> slub/slab management on cross-node allocations/frees when these cpus
> have to alloc/free skbs bound to a central node.
>
> skb allocated in RX path are ephemeral, they have a very short
> lifetime : Extra cost to maintain NUMA affinity is too expensive. What
> appeared as a nice idea four years ago is in fact a bad one.
>
> In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
> and two 10Gb NIC might deliver more than 28 million packets per second,
> needing all the available cpus.
>
> Cost of cross-node handling in network and vm stacks outperforms the
> small benefit hardware had when doing its DMA transfert in its 'local'
> memory node at RX time. Even trying to differentiate the two allocations
> done for one skb (the sk_buff on local node, the data part on NIC
> hardware node) is not enough to bring good performance.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> ---
> include/linux/skbuff.h | 20 ++++++++++++++++----
> net/core/skbuff.c | 13 +------------
> 2 files changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0b53c43..05a358f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -496,13 +496,13 @@ extern struct sk_buff *__alloc_skb(unsigned int size,
> static inline struct sk_buff *alloc_skb(unsigned int size,
> gfp_t priority)
> {
> - return __alloc_skb(size, priority, 0, -1);
> + return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
> }
>
> static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
> gfp_t priority)
> {
> - return __alloc_skb(size, priority, 1, -1);
> + return __alloc_skb(size, priority, 1, NUMA_NO_NODE);
> }
>
> extern bool skb_recycle_check(struct sk_buff *skb, int skb_size);
> @@ -1563,13 +1563,25 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
> return skb;
> }
>
> -extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask);
> +/**
> + * __netdev_alloc_page - allocate a page for ps-rx on a specific device
> + * @dev: network device to receive on
> + * @gfp_mask: alloc_pages_node mask
> + *
> + * Allocate a new page. dev currently unused.
> + *
> + * %NULL is returned if there is no free memory.
> + */
> +static inline struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> +{
> + return alloc_pages_node(NUMA_NO_NODE, gfp_mask, 0);
> +}
>
> /**
> * netdev_alloc_page - allocate a page for ps-rx on a specific device
> * @dev: network device to receive on
> *
> - * Allocate a new page node local to the specified device.
> + * Allocate a new page. dev currently unused.
> *
> * %NULL is returned if there is no free memory.
> */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 752c197..4e8b82e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -247,10 +247,9 @@ EXPORT_SYMBOL(__alloc_skb);
> struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> unsigned int length, gfp_t gfp_mask)
> {
> - int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
> struct sk_buff *skb;
>
> - skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, node);
> + skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, NUMA_NO_NODE);
> if (likely(skb)) {
> skb_reserve(skb, NET_SKB_PAD);
> skb->dev = dev;
> @@ -259,16 +258,6 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> }
> EXPORT_SYMBOL(__netdev_alloc_skb);
>
> -struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
> -{
> - int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
> - struct page *page;
> -
> - page = alloc_pages_node(node, gfp_mask, 0);
> - return page;
> -}
> -EXPORT_SYMBOL(__netdev_alloc_page);
> -
> void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
> int size)
> {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists