[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <557634F3.4070700@gmail.com>
Date: Mon, 08 Jun 2015 18:36:03 -0600
From: David Ahern <dsahern@...il.com>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>,
Shrijeet Mukherjee <shm@...ulusnetworks.com>
CC: nicolas.dichtel@...nd.com, ebiederm@...ssion.com,
hadi@...atatu.com, davem@...emloft.net, stephen@...workplumber.org,
netdev@...r.kernel.org, roopa@...ulusnetworks.com,
gospo@...ulusnetworks.com, jtoppins@...ulusnetworks.com,
nikolay@...ulusnetworks.com
Subject: Re: [RFC net-next 3/3] rcv path changes for vrf traffic
On 6/8/15 1:58 PM, Hannes Frederic Sowa wrote:
> Hi Shrijeet,
>
> On Mo, 2015-06-08 at 11:35 -0700, Shrijeet Mukherjee wrote:
>> From: Shrijeet Mukherjee <shm@...ulusnetworks.com>
>>
>> Incoming frames for IP protocol stacks need the IIF to be changed
>> from the actual interface to the VRF device. This allows the IIF
>> rule to be used to select tables (or do regular PBR)
>>
>> This change selects the iif to be the VRF device if it exists and
>> the incoming iif is enslaved to the VRF device.
>>
>> Since VRF aware sockets are always bound to the VRF device this
>> system allows return traffic to find the socket of origin.
>>
>> changes are in the arp_rcv, icmp_rcv and ip_rcv paths
>>
>> Question : I did not wrap the rcv modifications, in CONFIG_NET_VRF
>> as it would create code variations and the vrf_ptr check is there
>> I can make that whole thing modular.
>
> From an architectural level I think the output path looks good. For the
> input path I would also to propose my (I think) more flexible solution:
>
Something is still not right on the output path. e.g., I see the wrong
source address showing up on ping -I vrf0:
# ping -I vrf0 1.1.1.254
ping: Warning: source address might be selected on device other than vrf0.
PING 1.1.1.254 (1.1.1.254) from 172.16.1.52 vrf0: 56(84) bytes of data.
64 bytes from 1.1.1.254: icmp_seq=1 ttl=64 time=0.215 ms
...
The reason is because the datagram connect function fails to look up the
outbound route in the vrf and falls back to the main table. (As an aside
the fallback to other tables is something that should not be happening
for VRFs; you want to use the table specific to the VRF.)
The route lookup fails because it passes in oif = vrf device (this VRF
design relies on bind to device which sets oif in the flow). That is
good for selecting the table to use for the lookups, but not good for
selecting the route within the table.
This is one way to fix the connect problem:
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03afb6a..a18798caec25 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -245,11 +245,18 @@ static inline void ip_route_connect_init(struct
flowi4 *fl4, __be32 dst, __be32
__be16 sport, __be16 dport,
struct sock *sk)
{
+ struct net_device *dev = dev_get_by_index(sock_net(sk), oif);
__u8 flow_flags = 0;
if (inet_sk(sk)->transparent)
flow_flags |= FLOWI_FLAG_ANYSRC;
+ if (dev) {
+ if (netif_is_vrf(dev))
+ flow_flags |= FLOWI_FLAG_VRFSRC;
+ dev_put(dev);
+ }
+
flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
protocol, flow_flags, dst, src, dport, sport);
}
which essentially tells fib_table_lookup to drop the OIF comparison
after selecting the table per this change made in the patch Shrijeet posted:
if (!(flp->flowi4_flags & FLOWI_FLAG_VRFSRC)) {
if (flp->flowi4_oif &&
flp->flowi4_oif != nh->nh_oif)
continue;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists