netdev - Re: [PATCH net-next-2.6] sched: use xps information for qdisc NUMA affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 30 Nov 2010 19:52:07 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Ben Hutchings <bhutchings@...arflare.com>
Cc:	Tom Herbert <therbert@...gle.com>,
	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next-2.6] sched: use xps information for qdisc NUMA
 affinity

Le mardi 30 novembre 2010 à 18:46 +0000, Ben Hutchings a écrit :

> Yes, that's why I proposed an ethtool interface for reconfiguring this.
> Although to be honest I haven't yet constructed a case where it made a
> difference.  I think the most important objects to be allocated on the
> right node are RX buffers, and as long as refill is scheduled on the
> same CPU as the IRQ this already happens.
> 

Hmm, right now RX skbs are allocated on the right node, since they are
allocated on the node of the cpu handling the {soft}irq.

commit 564824b0c52c346

net: allocate skbs on local node

    commit b30973f877 (node-aware skb allocation) spread a wrong habit of
    allocating net drivers skbs on a given memory node : The one closest to
    the NIC hardware. This is wrong because as soon as we try to scale
    network stack, we need to use many cpus to handle traffic and hit
    slub/slab management on cross-node allocations/frees when these cpus
    have to alloc/free skbs bound to a central node.
    
    skb allocated in RX path are ephemeral, they have a very short
    lifetime : Extra cost to maintain NUMA affinity is too expensive. What
    appeared as a nice idea four years ago is in fact a bad one.
    
    In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
    and two 10Gb NIC might deliver more than 28 million packets per second,
    needing all the available cpus.
    
    Cost of cross-node handling in network and vm stacks outperforms the
    small benefit hardware had when doing its DMA transfert in its 'local'
    memory node at RX time. Even trying to differentiate the two allocations
    done for one skb (the sk_buff on local node, the data part on NIC
    hardware node) is not enough to bring good performance.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html