[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1267531905.21749.21.camel@bigi>
Date: Tue, 02 Mar 2010 07:11:45 -0500
From: jamal <hadi@...erus.ca>
To: Herbert Xu <herbert@...dor.apana.org.au>
Cc: davem@...emloft.net, kaber@...sh.net, yoshfuji@...ux-ipv6.org,
nakam@...ux-ipv6.org, eric.dumazet@...il.com,
netdev@...r.kernel.org,
Steffen Klassert <steffen.klassert@...unet.com>
Subject: Re: [RFC PATCH]xfrm: fix perpetual bundles
On Tue, 2010-03-02 at 19:27 +0800, Herbert Xu wrote:
> On Wed, Feb 24, 2010 at 08:19:24AM -0500, jamal wrote:
> > 1)In the connect() stage, in the slow path a route cache is
> > created with the rth->fl.fl4_src of 0.0.0.0...
> > ==> policy->bundles is empty, so we do a lookup, fail, create
> > one.. (remember rth->fl.fl4_src of 0.0.0.0 at this stage and
> > thats what we end storing in the bundle/xdst for later comparison
> > instead of the skb's fl)
>
> So this is root number 1. When this stuff was first written this
> case simply wasn't possible. So I think the question we need to
> ask here is can we get a valid address there at the connect stage?
fl->fl4_src is valid non-zero. But in xfrm4_fill_dst()
we do wholesale copy of xdst->u.rt.fl = rt->fl; and rt->fl.fl4_src is
0.0.0.0.
> After all, for non-IPsec connect(2)s, you do get a valid IP address.
> So I don't see why the IPsec case should be different.
>
> Creating a bundle with a zero source address is just a hack to
> make connect(2) succeed immediately. AFAICS getting a valid IP
> address can also be done without waiting for the whole IPsec state
> to be created.
>
I did try to "fix it" above via:
+ if (!xdst->u.rt.fl.fl4_src) {
+ xdst->u.rt.fl.fl4_src = fl->fl4_src;
+ }
But this breaks again later in sendmsg bundle lookup because of
XFRM_SUB_POLICY. If i turned off config XFRM_SUB_POLICY, then
all works. I didnt look closely, but SUB_POLICY does do a memcpy
or two off the dst passed in connect() - which has the wrong src.
So i would have to "fix" a few more spots for it to work. This is
where i gave up concluding that i was just plugging with band-aids.
> Of course if anybody is still interested we could also revisit
> the neighbouresque queueing idea.
not plugged into that discussion..
> > 2)ping sends a packet (does a sendmsg)
> > ==> xfrm_find_bundle() ends up comparing skb's->fl (non-zero
> > fl->fl4_src) with what we stored from #1b. Fails.
> > ==> we create a new bundle at attach the old one at the end of it.
> > ...and now policy->bundles has two xdst entries
>
> This is the way it's supposed to work.
> > 3) Repeat #2, and now we have 3 xdsts in policy bundles
>
> This is what I don't understand. The code is supposed to look
> at every bundle attached to the policy. So why doesn't it find
> the one we created at step #2?
The issue is that the comparison is between xdst->u.rt.fl.fl4_src and
fl->fl4_src. fl->fl4_src is always non-zero. stored
xdst->u.rt.fl.fl4_src is always zero
> In conclusion, I think we have two problems, with the second
> one being the most immediate cause of your symptoms.
Remember the route cache (refer to dst copying above) is created at
connect time;->
So Steffen (on CC) tried to "fix it" by fixing at route cache creation
time. His approach:
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2778,15 +2778,26 @@ int ip_route_output_flow(struct net *net, struct
rtable **rp, struct flowi *flp,
struct sock *sk, int flags)
{
int err;
+ int update_route = 0;
if ((err = __ip_route_output_key(net, rp, flp)) != 0)
return err;
if (flp->proto) {
- if (!flp->fl4_src)
+ if (!flp->fl4_src) {
flp->fl4_src = (*rp)->rt_src;
- if (!flp->fl4_dst)
+ update_route = 1;
+ }
+ if (!flp->fl4_dst) {
flp->fl4_dst = (*rp)->rt_dst;
+ update_route = 1;
+ }
+ if (update_route) {
+ dst_release(&(*rp)->u.dst);
+ if ((err = __ip_route_output_key(net, rp,
flp)) != 0)
+ return err;
+ }
+
err = __xfrm_lookup(net, (struct dst_entry **)rp, flp,
sk,
flags ? XFRM_LOOKUP_WAIT : 0);
if (err == -EREMOTE)
--
I was worried about the impact of this on something else that expects
the behavior.
cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists