[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110304.144448.115945732.davem@davemloft.net>
Date: Fri, 04 Mar 2011 14:44:48 -0800 (PST)
From: David Miller <davem@...emloft.net>
To: eric.dumazet@...il.com
Cc: xiaosuo@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next-2.6] inetpeer: seqlock optimization
From: Eric Dumazet <eric.dumazet@...il.com>
Date: Fri, 04 Mar 2011 23:13:59 +0100
> Le vendredi 04 mars 2011 à 12:45 -0800, David Miller a écrit :
>> From: David Miller <davem@...emloft.net>
>> Date: Fri, 04 Mar 2011 11:17:05 -0800 (PST)
>>
>> > From: Eric Dumazet <eric.dumazet@...il.com>
>> > Date: Fri, 04 Mar 2011 16:09:08 +0100
>> >
>> >> Here is a patch to implement this idea.
>> >
>> > Applied, thanks Eric!
>>
>> Unfortunately, I have to revert, the lockdep annotations needs to
>> be updated:
>>
>> net/ipv4/inetpeer.c: In function ‘peer_avl_rebalance’:
>> net/ipv4/inetpeer.c:274:10: error: ‘seqlock_t’ has no member named ‘dep_map’
>
> Oops thats right, here is an updated version.
Applied, thanks Eric!
With this and the following patch applied to my no-routing-cache tree,
output route lookup on my Niagara2 is down to 2966 cycles! For reference
with just the plain routing cache removal, it was as much as 3832 cycles.
udpflood is a lot faster too, with plain routing cache removal it ran as:
bash$ time ./bin/udpflood -l 10000000 10.2.2.11
real 3m9.921s
user 0m9.520s
sys 3m0.440s
But now it's:
bash$ time ./bin/udpflood -l 10000000 10.2.2.11
real 2m45.903s
user 0m8.640s
sys 2m37.280s
:-)
--------------------
ipv4: Optimize flow initialization in output route lookup.
We burn a lot of useless cycles, cpu store buffer traffic, and
memory operations memset()'ing the on-stack flow used to perform
output route lookups in __ip_route_output_key().
Only the first half of the flow object members even matter for
output route lookups in this context, specifically:
FIB rules matching cares about:
dst, src, tos, iif, oif, mark
FIB trie lookup cares about:
dst
FIB semantic match cares about:
tos, scope, oif
Therefore only initialize these specific members and elide the
memset entirely.
On Niagara2 this kills about ~300 cycles from the output route
lookup path.
Likely, we can take things further, since all callers of output
route lookups essentially throw away the on-stack flow they use.
So they don't care if we use it as a scratch-pad to compute the
final flow key.
Signed-off-by: David S. Miller <davem@...emloft.net>
---
net/ipv4/route.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 04b8954..e3a5a89 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1670,14 +1670,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
struct rtable *__ip_route_output_key(struct net *net, const struct flowi *oldflp)
{
u32 tos = RT_FL_TOS(oldflp);
- struct flowi fl = { .fl4_dst = oldflp->fl4_dst,
- .fl4_src = oldflp->fl4_src,
- .fl4_tos = tos & IPTOS_RT_MASK,
- .fl4_scope = ((tos & RTO_ONLINK) ?
- RT_SCOPE_LINK : RT_SCOPE_UNIVERSE),
- .mark = oldflp->mark,
- .iif = net->loopback_dev->ifindex,
- .oif = oldflp->oif };
+ struct flowi fl;
struct fib_result res;
unsigned int flags = 0;
struct net_device *dev_out = NULL;
@@ -1688,6 +1681,15 @@ struct rtable *__ip_route_output_key(struct net *net, const struct flowi *oldflp
res.r = NULL;
#endif
+ fl.oif = oldflp->oif;
+ fl.iif = net->loopback_dev->ifindex;
+ fl.mark = oldflp->mark;
+ fl.fl4_dst = oldflp->fl4_dst;
+ fl.fl4_src = oldflp->fl4_src;
+ fl.fl4_tos = tos & IPTOS_RT_MASK;
+ fl.fl4_scope = ((tos & RTO_ONLINK) ?
+ RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
+
rcu_read_lock();
if (oldflp->fl4_src) {
rth = ERR_PTR(-EINVAL);
--
1.7.4.1
Powered by blists - more mailing lists