[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <873abli9is.fsf@basil.nowhere.org>
Date: Mon, 04 May 2009 09:59:07 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Tom Herbert <therbert@...gle.com>
Cc: netdev@...r.kernel.org, David Miller <davem@...emloft.net>
Subject: Re: [PATCH v2] Receive Packet Steering
Tom Herbert <therbert@...gle.com> writes:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 052dd47..3107544 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1906,6 +1906,142 @@ int weight_p __read_mostly = 64; /*
> old backlog weight */
>
> DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
>
> +static u32 simple_hashrnd;
This should be __read_mostly
> +static int simple_hashrnd_initialized;
Also I suspect you can just use 0 as uninitialized
and avoid one variable. If the RNG reports 0 you get
worst case another initialization.
> +static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
> +{
> + u32 addr1, addr2, ports;
> + struct ipv6hdr *ip6;
> + struct iphdr *ip;
> + u32 hash, ihl;
> + u8 ip_proto;
> + int cpu;
> +
> + if (!dev->rps_map_len)
> + return -1;
> +
> + if (unlikely(!simple_hashrnd_initialized)) {
> + get_random_bytes(&simple_hashrnd, 4);
> + simple_hashrnd_initialized = 1;
> + }
The usual problem of this is that if the kernel gets a packet sufficiently
early the random state will be always the default state, which is not
very random.
So either you use a timeout to reinit regularly (like other subsystems
do) or just use a fixed value for reproducibility. I suspect the
later would be nicer for benchmakers.
+ case __constant_htons(ETH_P_IPV6):
+ if (!pskb_may_pull(skb, sizeof(*ip6)))
+ return -1;
+
+ ip6 = (struct ipv6hdr *) skb->data;
+ ip_proto = ip6->nexthdr;
+ addr1 = ip6->saddr.s6_addr32[3];
+ addr2 = ip6->daddr.s6_addr32[3];
Wouldn't it be better to hash in everything in the ipv6 address in this case?
> +
> + hash = jhash_3words(addr1, addr2, ports, simple_hashrnd);
> +
> + cpu = skb->dev->rps_map[((u64) hash * dev->rps_map_len) >> 32];
For 32bit systems it would be nice to avoid the u64 cast. gcc doesn't
generate very good code for that.
> + return cpu_online(cpu) ? cpu : -1;
I suspect this is still racy with cpu hotunplug.
> + cpus_clear(__get_cpu_var(rps_remote_softirq_cpus));
> +
> + local_irq_enable();
> +}
> +
> +/**
> + * enqueue_to_backlog is called to queue an skb to a per CPU backlog
> + * queue (may be a remote CPU queue).
> + */
> +static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
> +{
> + struct softnet_data *queue;
> + unsigned long flags;
> +
> + queue = &per_cpu(softnet_data, cpu);
Are you sure preemption is disabled here? Otherwise this must be
one line below (can be tested by enabling preempt & preempt debug)
> +
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> +
> + err = bitmap_parse(buf, len, cpumask_bits(&mask), nr_cpumask_bits);
> + if (err)
> + return err;
> +
> + rtnl_lock();
> + if (dev_isalive(net)) {
> + if (!net->rps_map) {
> + net->rps_map = kzalloc(sizeof(u16) *
> + num_possible_cpus(), GFP_KERNEL);
> + if (!net->rps_map)
> + return -ENOMEM;
You don't unlock rtnl_lock here.
-Andi
--
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists