[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1381881751-6719-1-git-send-email-horms@verge.net.au>
Date: Wed, 16 Oct 2013 09:02:31 +0900
From: Simon Horman <horms@...ge.net.au>
To: YOSHIFUJI Hideaki / 吉藤英明
<yoshfuji@...ux-ipv6.org>
Cc: lvs-devel@...r.kernel.org, netdev@...r.kernel.org,
Julian Anastasov <ja@....bg>,
Mark Brooks <mark@...dbalancer.org>,
Simon Horman <horms@...ge.net.au>
Subject: [RFC net-next] ipv6: Use destination address determined by IPVS
In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
such that it creates and uses a neigh entry if none is found.
Subsequently the 'n' field was removed from struct rt6_info.
Unfortunately my analysis is that in the case of IPVS direct routing this
change leads to incorrect behaviour as in this case packets may be output
to a destination other than where they would be output according to the
route table. In particular, the destination address may actually be a local
address and empirically a neighbour lookup seems to result in it becoming
unreachable.
This patch resolves the problem by providing the destination address
determined by IPVS to ip6_finish_output2() in the skb callback. Although
this seems to work I can see several problems with this approach:
* It is rather ugly, stuffing an IPVS exception right in
the middle of IPv6 code. The overhead could be eliminated for many users
by using a staic key. But none the less it is not attractive.
* The use of the skb callback is may not be valid
as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
alternative is to add a new field to struct sk_buff.
* This covers all IPv6 packets output by IPVS but actually
only those output using IPVS Direct-Routing need this. One way to
resolve this would be to add a more fine-grained ipvs_property to
struct sk_buff.
Reported-by: Mark Brooks <mark@...dbalancer.org>
Signed-off-by: Simon Horman <horms@...ge.net.au>
---
include/net/ip_vs.h | 6 ++++++
net/ipv6/ip6_output.c | 9 +++++++--
net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..11d90a6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
atomic_read(&dest->inactconns);
}
+struct ipvs_skb_cb {
+ struct in6_addr *daddr;
+};
+
+#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
+
#endif /* _NET_IP_VS_H */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..a340180 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -52,6 +52,7 @@
#include <net/addrconf.h>
#include <net/rawv6.h>
#include <net/icmp.h>
+#include <net/ip_vs.h>
#include <net/xfrm.h>
#include <net/checksum.h>
#include <linux/mroute6.h>
@@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
struct dst_entry *dst = skb_dst(skb);
struct net_device *dev = dst->dev;
struct neighbour *neigh;
- struct in6_addr *nexthop;
+ struct in6_addr *nexthop, *daddr;
int ret;
skb->protocol = htons(ETH_P_IPV6);
@@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
}
rcu_read_lock_bh();
- nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+ if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
+ daddr = IP_VS_SKB_CB(skb)->daddr;
+ else
+ daddr = &ipv6_hdr(skb)->daddr;
+ nexthop = rt6_nexthop((struct rt6_info *)dst, daddr);
neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
if (unlikely(!neigh))
neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c47444e..054b679 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
rt = (struct rt6_info *) dst;
}
+ IP_VS_SKB_CB(skb)->daddr = daddr;
+
local = __ip_vs_is_local_route6(rt);
if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
rt_mode)) {
--
1.8.4
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists