[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <56393E5A.6080108@cumulusnetworks.com>
Date: Tue, 03 Nov 2015 15:08:10 -0800
From: roopa <roopa@...ulusnetworks.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Robert Shearman <rshearma@...cade.com>, davem@...emloft.net,
netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2] mpls: support for dead routes
On 11/3/15, 10:54 AM, Eric W. Biederman wrote:
> roopa <roopa@...ulusnetworks.com> writes:
>
>> On 11/3/15, 7:08 AM, Robert Shearman wrote:
>>> On 03/11/15 05:13, Roopa Prabhu wrote:
>>>> From: Roopa Prabhu <roopa@...ulusnetworks.com>
>>>>
>>>> Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
>>>> routes due to link events. Also adds code to ignore dead
>>>> routes during route selection
>>>>
>>>> Signed-off-by: Roopa Prabhu <roopa@...ulusnetworks.com>
>>>> ---
>>>> RFC to v1:
>>>> Addressed a few comments from Eric and Robert:
>>>> - remove support for weighted nexthops
>>>> - Use rt_nhn_alive in the rt structure to keep count of alive
>>>> routes.
>>>> What i have not done is: sort nexthops on link events.
>>>> I am not comfortable recreating or sorting nexthops on
>>>> every carrier change. This leaves scope for optimizing in the future
>>>>
>>>> v1 to v2:
>>>> Fix dead nexthop checks as suggested by dave
>>>>
>>>> net/mpls/af_mpls.c | 191 ++++++++++++++++++++++++++++++++++++++++++++--------
>>>> net/mpls/internal.h | 3 +
>>>> 2 files changed, 166 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>>> index c70d750..5e88118 100644
>>>> --- a/net/mpls/af_mpls.c
>>>> +++ b/net/mpls/af_mpls.c
>>>> @@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>>>> }
>>>> EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
>>>>
>>>> -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>> - struct sk_buff *skb, bool bos)
>>>> +static u32 mpls_multipath_hash(struct mpls_route *rt,
>>>> + struct sk_buff *skb, bool bos)
>>>> {
>>>> struct mpls_entry_decoded dec;
>>>> struct mpls_shim_hdr *hdr;
>>>> bool eli_seen = false;
>>>> int label_index;
>>>> - int nh_index = 0;
>>>> u32 hash = 0;
>>>>
>>>> - /* No need to look further into packet if there's only
>>>> - * one path
>>>> - */
>>>> - if (rt->rt_nhn == 1)
>>>> - goto out;
>>>> -
>>>> for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
>>>> label_index++) {
>>>> if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
>>>> @@ -165,9 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>> }
>>>> }
>>>>
>>>> - nh_index = hash % rt->rt_nhn;
>>>> + return hash;
>>>> +}
>>>> +
>>>> +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>> + struct sk_buff *skb, bool bos)
>>>> +{
>>>> + u32 hash = 0;
>>>> + int nh_index;
>>>> + int n = 0;
>>>> +
>>>> + /* No need to look further into packet if there's only
>>>> + * one path
>>>> + */
>>>> + if (rt->rt_nhn == 1)
>>>> + goto out;
>>>> +
>>>> + if (rt->rt_nhn_alive <= 0)
>>>> + return NULL;
>>>> +
>>>> + hash = mpls_multipath_hash(rt, skb, bos);
>>>> + nh_index = hash % rt->rt_nhn_alive;
>>>> + for_nexthops(rt) {
>>> This should be optimised in the common rt->rt_nhn_alive == rt->rt_nhn case by doing the direct array index and avoid the nh walk.
>> sure, that is an optimization. i will add that.
>>>> + if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>>> + continue;
>>>> + if (n == nh_index)
>>>> + return nh;
>>>> + n++;
>>>> + } endfor_nexthops(rt);
>>>> +
>>>> out:
>>>> - return &rt->rt_nh[nh_index];
>>>> + return &rt->rt_nh[0];
>>>> }
>>>>
>>>> static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>>> @@ -365,6 +386,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 max_alen)
>>>> GFP_KERNEL);
>>>> if (rt) {
>>>> rt->rt_nhn = num_nh;
>>>> + rt->rt_nhn_alive = num_nh;
>>>> rt->rt_max_alen = max_alen_aligned;
>>>> }
>>>>
>>>> @@ -536,6 +558,15 @@ static int mpls_nh_assign_dev(struct net *net, struct mpls_route *rt,
>>>>
>>>> RCU_INIT_POINTER(nh->nh_dev, dev);
>>>>
>>>> + if (!netif_carrier_ok(dev))
>>>> + nh->nh_flags |= RTNH_F_LINKDOWN;
>>>> +
>>>> + if (!(dev->flags & IFF_UP))
>>>> + nh->nh_flags |= RTNH_F_DEAD;
>>> Below you're testing IFF_RUNNING | IFF_LOWER_UP. Is the difference intended here?
>> Not really. I can change it. This is during adding the route. I tried to keep the checks consistent with ipv4.
>>
>>>> +
>>>> + if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>>> + rt->rt_nhn_alive--;
>>> You don't update your new rt->rt_flags field here. This highlights the issue with duplicating data, which you're doing with the rt_flags field.
>> I am not sure I understand completely. Would you prefer i loop and derive the rt_flags from nh_flags during dumps than storing them in rt_flags ?. Sure I can do that. I did not think it was such a big deal because, all routes (ipv4 and others) have rt_flags and all routes today carry RTNH_ flags and you have to send them to userspace in rtm_flags anyways.
>> I am just trying to keep the use and semantics of RTNH_F flags for mpls routes consistent with the other family routes.
>>
>>>> +
>>>> return 0;
>>>>
>>>> errout:
>>>> @@ -577,7 +608,7 @@ errout:
>>>> }
>>>>
>>>> static int mpls_nh_build(struct net *net, struct mpls_route *rt,
>>>> - struct mpls_nh *nh, int oif,
>>>> + struct mpls_nh *nh, int oif, int hops,
>>> This change isn't required.
>> ack, this is a leftover from my attempt to add weights. will remove it.
>>>> struct nlattr *via, struct nlattr *newdst)
>>>> {
>>>> int err = -ENOMEM;
>>>> @@ -666,7 +697,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>> /* neither weighted multipath nor any flags
>>>> * are supported
>>>> */
>>>> - if (rtnh->rtnh_hops || rtnh->rtnh_flags)
>>>> + if (rtnh->rtnh_flags || rtnh->rtnh_flags)
>>> As the build bot has pointed out, this change is not entirely correct.
>> yes, this was fixed later.
>>>> goto errout;
>>>>
>>>> attrlen = rtnh_attrlen(rtnh);
>>>> @@ -681,8 +712,8 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>> goto errout;
>>>>
>>>> err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
>>>> - rtnh->rtnh_ifindex, nla_via,
>>>> - nla_newdst);
>>>> + rtnh->rtnh_ifindex, rtnh->rtnh_hops,
>>>> + nla_via, nla_newdst);
>>> This change isn't required.
>> yep, ack. same here left over from weights. will remove it.
>>
>>>> if (err)
>>>> goto errout;
>>>>
>>>> @@ -875,34 +906,100 @@ free:
>>>> return ERR_PTR(err);
>>>> }
>>>>
>>>> -static void mpls_ifdown(struct net_device *dev)
>>>> +static void mpls_ifdown(struct net_device *dev, int event)
>>>> {
>>>> struct mpls_route __rcu **platform_label;
>>>> struct net *net = dev_net(dev);
>>>> - struct mpls_dev *mdev;
>>>> unsigned index;
>>>> + int dead;
>>>>
>>>> platform_label = rtnl_dereference(net->mpls.platform_label);
>>>> for (index = 0; index < net->mpls.platform_labels; index++) {
>>>> struct mpls_route *rt = rtnl_dereference(platform_label[index]);
>>>> +
>>>> if (!rt)
>>>> continue;
>>>> + dead = 0;
>>>> for_nexthops(rt) {
>>>> + if ((event == NETDEV_DOWN &&
>>>> + (nh->nh_flags & RTNH_F_DEAD)) ||
>>>> + (event == NETDEV_CHANGE &&
>>>> + (nh->nh_flags & RTNH_F_LINKDOWN))) {
>>>> + dead++;
>>>> + continue;
>>>> + }
>>>> +
>>>> if (rtnl_dereference(nh->nh_dev) != dev)
>>>> continue;
>>>> - nh->nh_dev = NULL;
>>>> + switch (event) {
>>>> + case NETDEV_DOWN:
>>>> + case NETDEV_UNREGISTER:
>>>> + nh->nh_flags |= RTNH_F_DEAD;
>>>> + /* fall through */
>>>> + case NETDEV_CHANGE:
>>>> + nh->nh_flags |= RTNH_F_LINKDOWN;
>>>> + rt->rt_nhn_alive--;
>>> Are these operations atomic on all architectures?
>> I think so.
> Ugh. I don't see how.
> A) You are doing read-modify-write.
> B) You are modifying multiple fields.
>
> So while the individual field writes may be atomic the changes certainly
> are not atomic.
>
>>> Even if they are, I'm a bit worried about the RCU-correctness of this. I think the fallthrough case in mpls_select_multipath saves you, causing traffic to get via nh0 instead of the one selected by the hash (which is fine transiently).
>>>
> I share your concern. In-place modification sounds good in principle
> for handling RCU, but there is a reason why the original version of
> RCU was called Read-Copy-Update. Given that there are multiple fields
> that seem to depend upon each other I am not certain we can perform rcu
> safe modifications without creating a fresh copy of the route.
>
I misread the initial comment. For some reason i thought the concern pointed out was between multiple updaters. And currently those are all under rtnl. I now realize it was the reader in the packet path you were talking about which may not see the update atomically. I see how this will need to be a fresh copy and mpls_route_update on every link state change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists