[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1291394055.2897.483.camel@edumazet-laptop>
Date: Fri, 03 Dec 2010 17:34:15 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Shawn Bohrer <sbohrer@...advisors.com>
Cc: netdev@...r.kernel.org, therbert@...gle.com
Subject: Re: RFS configuration questions
Le vendredi 03 décembre 2010 à 10:00 -0600, Shawn Bohrer a écrit :
> On Thu, Dec 02, 2010 at 10:40:41PM +0100, Eric Dumazet wrote:
> > Le jeudi 02 décembre 2010 à 15:16 -0600, Shawn Bohrer a écrit :
> > > I've been playing around with RPS/RFS on my multiqueue 10g Chelsio NIC
> > > and I've got some questions about configuring RFS.
> > >
> > > I've enabled RPS with:
> > >
> > > for x in $(seq 0 7); do
> > > echo FFFFFFFF,FFFFFFFF > /sys/class/net/vlan816/queues/rx-${x}/rps_cpus
> > > done
> > >
> > > This appears to work when I watch 'mpstat -P ALL 1' as I can see the
> > > softirq load is now getting distributed across all of the CPUs instead
> > > of just the four (the card is a two port card and assigns four queues
> > > per port) original hw receive queues which I have bound to CPUs
> > > 0-3.
> > >
> > > To enable RFS I've run:
> > >
> > > echo 16384 > /proc/sys/net/core/rps_sock_flow_entries
> > >
> > > Is there any explanation of what this sysctl actually does? Is this
> > > the max number of sockets/flows that the kernel can steer? Is this a
> > > system wide max, a per interface max, or a per receive queue max?
> > >
> >
> > Yes, some doc is missing...
> >
> > Its a system wide and shared limit.
>
> So the sum of /sys/class/net/*/queues/rx-*/rps_flow_cnt should be less
> than or equal to rps_sock_flow_entries?
>
I always used same count, but you probably can use lower values
for /sys/class/net/*/queues/rx-*/rps_flow_cnt if you dont have much
memory.
> > > Next I ran:
> > >
> > > for x in $(seq 0 7); do
> > > echo 16384 > /sys/class/net/vlan816/queues/rx-${x}/rps_flow_cnt
> > > done
> > >
> > > Is this correct? Is these the max number of sockets/flows that can be
> > > steered per receive queue? Does the sum of these values need to add
> > > up to rps_sock_flow_entries (I also tried 2048)? Is this all that is
> > > needed to enable RFS?
> > >
> >
> > Yes thats all.
>
> Same as above... I should be using 2048 if I have 8 queues and have
> set rps_sock_flow_entries to 16384? Out of curiosity what happens
> when you open more sockets than you have rps_flow_cnt?
Its a hash table, so collisions can happen. Nothing bad happens.
>
> > > With these settings I can watch 'mpstat -P ALL 1' and it doesn't
> > > appear RFS has changed the softirq load. To get a better idea if it
> > > was working I used taskset to bind my receiving processes to a set of
> > > cores, yet mpstat still shows the softirq load getting distributed
> > > across all cores, not just the ones where my receiving processes are
> > > bound. Is there a better way to determine if RFS is actually working?
> > > Have I configured RFS incorrectly?
> >
> > It seems fine to me, but what kind of workload do you have, and what
> > version of kernel do you run ?
>
> I just did some more testing on 2.6.36.1. Using netperf UDP_STREAM
> and TCP_STREAM I was able to see that the softirq load would run on
> the CPU where netperf was bound so it appears that RFS is working.
Be careful about softirq times, they most of the time are wrong.
You can do "cat /proc/net/softnet_stat" and check last column to see if
packets are really distributed to "other cpus"
>
> However if I run one of my applications which is a single process
> listening to ~30 multicast addresses the softirq load does not run on
> the CPU where the application is bound. Does RFS not support
> receiving multicast?
No because :
net/ipv4/udp.c
static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
int rc;
if (inet_sk(sk)->inet_daddr)
sock_rps_save_rxhash(sk, skb->rxhash);
Rationale is : RFS is implemented for _connected_ sockets only (TCP or
connected UDP)
(You can check commit changelog :
http://git2.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commit;h=fec5e652e58fa6017b2c9e06466cb2a6538de5b4
This is because rxhash value is different for each src address.
If we allowed your process to call sock_rps_save_rxhash(sk,
skb->rxhash); with many different rxhash, it would blow the hash table.
You might try to remove the "if (inet_sk(sk)->inet_daddr)" test for
your benches, or add a new logic (socket flag maybe ?) to really trigger
RFS on your UDP sockets.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists