[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110524.160123.2051949867829317339.davem@davemloft.net>
Date: Tue, 24 May 2011 16:01:23 -0400 (EDT)
From: David Miller <davem@...emloft.net>
To: netdev@...r.kernel.org
Subject: Re: small RPS cache for fragments?
From: David Miller <davem@...emloft.net>
Date: Tue, 17 May 2011 14:33:42 -0400 (EDT)
>
> It seems to me that we can solve the UDP fragmentation problem for
> flow steering very simply by creating a (saddr/daddr/IPID) entry in a
> table that maps to the corresponding RPS flow entry.
>
> When we see the initial frag with the UDP header, we create the
> saddr/daddr/IPID mapping, and we tear it down when we hit the
> saddr/daddr/IPID mapping and the packet has the IP_MF bit clear.
>
> We only inspect the saddr/daddr/IPID cache when iph->frag_off is
> non-zero.
So I looked into implementing this now that it has been established
that we changed even Linux to emit fragments in-order.
The first problem we run into is that there is no "context" we can
use in all the places where skb_get_rxhash() gets called.
Part of the problem is that we call it from strange places, such as
egress packet schedulers. That's completely bogus.
Examples, FLOW classifier, META e-match, CHOKE, and SFB.
In fact, for the classifiers this means they aren't making use of the
precomputed TX hash values in the sockets like __skb_tx_hash() will
make use of. So this makes these packet schedulers operate
potentially more expensively than they need to.
If we could get rid of those silly cases, the stuff that remains
(macvtap and net/core/dev.c) could work with a NAPI context during
rxhash computation and use that to store the IP fragmentation
on-behind cached information.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists