netdev - Re: [PATCH net-next v2] mpls: support for dead routes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878u6erk01.fsf@x220.int.ebiederm.org>
Date:	Tue, 03 Nov 2015 12:54:54 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	roopa <roopa@...ulusnetworks.com>
Cc:	Robert Shearman <rshearma@...cade.com>, davem@...emloft.net,
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next v2] mpls: support for dead routes

roopa <roopa@...ulusnetworks.com> writes:

> On 11/3/15, 7:08 AM, Robert Shearman wrote:
>> On 03/11/15 05:13, Roopa Prabhu wrote:
>>> From: Roopa Prabhu <roopa@...ulusnetworks.com>
>>>
>>> Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
>>> routes due to link events. Also adds code to ignore dead
>>> routes during route selection
>>>
>>> Signed-off-by: Roopa Prabhu <roopa@...ulusnetworks.com>
>>> ---
>>> RFC to v1:
>>>          Addressed a few comments from Eric and Robert:
>>>          - remove support for weighted nexthops
>>>          - Use rt_nhn_alive in the rt structure to keep count of alive
>>>          routes.
>>>          What i have not done is: sort nexthops on link events.
>>>          I am not comfortable recreating or sorting nexthops on
>>>          every carrier change. This leaves scope for optimizing in the future
>>>
>>> v1 to v2:
>>>     Fix dead nexthop checks as suggested by dave
>>>
>>>   net/mpls/af_mpls.c  | 191 ++++++++++++++++++++++++++++++++++++++++++++--------
>>>   net/mpls/internal.h |   3 +
>>>   2 files changed, 166 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index c70d750..5e88118 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>>>   }
>>>   EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
>>>
>>> -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>> -                         struct sk_buff *skb, bool bos)
>>> +static u32 mpls_multipath_hash(struct mpls_route *rt,
>>> +                   struct sk_buff *skb, bool bos)
>>>   {
>>>       struct mpls_entry_decoded dec;
>>>       struct mpls_shim_hdr *hdr;
>>>       bool eli_seen = false;
>>>       int label_index;
>>> -    int nh_index = 0;
>>>       u32 hash = 0;
>>>
>>> -    /* No need to look further into packet if there's only
>>> -     * one path
>>> -     */
>>> -    if (rt->rt_nhn == 1)
>>> -        goto out;
>>> -
>>>       for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
>>>            label_index++) {
>>>           if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
>>> @@ -165,9 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>           }
>>>       }
>>>
>>> -    nh_index = hash % rt->rt_nhn;
>>> +    return hash;
>>> +}
>>> +
>>> +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>> +                         struct sk_buff *skb, bool bos)
>>> +{
>>> +    u32 hash = 0;
>>> +    int nh_index;
>>> +    int n = 0;
>>> +
>>> +    /* No need to look further into packet if there's only
>>> +     * one path
>>> +     */
>>> +    if (rt->rt_nhn == 1)
>>> +        goto out;
>>> +
>>> +    if (rt->rt_nhn_alive <= 0)
>>> +        return NULL;
>>> +
>>> +    hash = mpls_multipath_hash(rt, skb, bos);
>>> +    nh_index = hash % rt->rt_nhn_alive;
>>> +    for_nexthops(rt) {
>>
>> This should be optimised in the common rt->rt_nhn_alive == rt->rt_nhn case by doing the direct array index and avoid the nh walk.
>
> sure, that is an optimization.  i will add that.
>>
>>> +        if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>> +            continue;
>>> +        if (n == nh_index)
>>> +            return nh;
>>> +        n++;
>>> +    } endfor_nexthops(rt);
>>> +
>>>   out:
>>> -    return &rt->rt_nh[nh_index];
>>> +    return &rt->rt_nh[0];
>>>   }
>>>
>>>   static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>> @@ -365,6 +386,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 max_alen)
>>>                GFP_KERNEL);
>>>       if (rt) {
>>>           rt->rt_nhn = num_nh;
>>> +        rt->rt_nhn_alive = num_nh;
>>>           rt->rt_max_alen = max_alen_aligned;
>>>       }
>>>
>>> @@ -536,6 +558,15 @@ static int mpls_nh_assign_dev(struct net *net, struct mpls_route *rt,
>>>
>>>       RCU_INIT_POINTER(nh->nh_dev, dev);
>>>
>>> +    if (!netif_carrier_ok(dev))
>>> +        nh->nh_flags |= RTNH_F_LINKDOWN;
>>> +
>>> +    if (!(dev->flags & IFF_UP))
>>> +        nh->nh_flags |= RTNH_F_DEAD;
>>
>> Below you're testing IFF_RUNNING | IFF_LOWER_UP. Is the difference intended here?
> Not really. I can change it. This is during adding the route. I tried to keep the checks consistent with ipv4.
>
>>> +
>>> +    if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>> +        rt->rt_nhn_alive--;
>>
>> You don't update your new rt->rt_flags field here. This highlights the issue with duplicating data, which you're doing with the rt_flags field.
> I am not sure I understand completely. Would you prefer i loop and derive the rt_flags from nh_flags during dumps than storing them in rt_flags ?. Sure I can do that. I did not think it was such a big deal because, all routes (ipv4 and others) have rt_flags and all routes today carry RTNH_ flags and you have to send them to userspace in rtm_flags anyways.
> I am just trying to keep the use and semantics of RTNH_F flags for mpls routes consistent with the other family routes.
>
>>
>>> +
>>>       return 0;
>>>
>>>   errout:
>>> @@ -577,7 +608,7 @@ errout:
>>>   }
>>>
>>>   static int mpls_nh_build(struct net *net, struct mpls_route *rt,
>>> -             struct mpls_nh *nh, int oif,
>>> +             struct mpls_nh *nh, int oif, int hops,
>>
>> This change isn't required.
> ack, this is a leftover from my attempt to add weights. will remove it.
>>
>>>                struct nlattr *via, struct nlattr *newdst)
>>>   {
>>>       int err = -ENOMEM;
>>> @@ -666,7 +697,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>           /* neither weighted multipath nor any flags
>>>            * are supported
>>>            */
>>> -        if (rtnh->rtnh_hops || rtnh->rtnh_flags)
>>> +        if (rtnh->rtnh_flags || rtnh->rtnh_flags)
>>
>> As the build bot has pointed out, this change is not entirely correct.
> yes, this was fixed later.
>>
>>>               goto errout;
>>>
>>>           attrlen = rtnh_attrlen(rtnh);
>>> @@ -681,8 +712,8 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>               goto errout;
>>>
>>>           err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
>>> -                    rtnh->rtnh_ifindex, nla_via,
>>> -                    nla_newdst);
>>> +                    rtnh->rtnh_ifindex, rtnh->rtnh_hops,
>>> +                    nla_via, nla_newdst);
>>
>> This change isn't required.
> yep, ack. same here left over from weights. will remove it.
>
>>
>>>           if (err)
>>>               goto errout;
>>>
>>> @@ -875,34 +906,100 @@ free:
>>>       return ERR_PTR(err);
>>>   }
>>>
>>> -static void mpls_ifdown(struct net_device *dev)
>>> +static void mpls_ifdown(struct net_device *dev, int event)
>>>   {
>>>       struct mpls_route __rcu **platform_label;
>>>       struct net *net = dev_net(dev);
>>> -    struct mpls_dev *mdev;
>>>       unsigned index;
>>> +    int dead;
>>>
>>>       platform_label = rtnl_dereference(net->mpls.platform_label);
>>>       for (index = 0; index < net->mpls.platform_labels; index++) {
>>>           struct mpls_route *rt = rtnl_dereference(platform_label[index]);
>>> +
>>>           if (!rt)
>>>               continue;
>>> +        dead = 0;
>>>           for_nexthops(rt) {
>>> +            if ((event == NETDEV_DOWN &&
>>> +                 (nh->nh_flags & RTNH_F_DEAD)) ||
>>> +                 (event == NETDEV_CHANGE &&
>>> +                 (nh->nh_flags & RTNH_F_LINKDOWN))) {
>>> +                dead++;
>>> +                continue;
>>> +            }
>>> +
>>>               if (rtnl_dereference(nh->nh_dev) != dev)
>>>                   continue;
>>> -            nh->nh_dev = NULL;
>>> +            switch (event) {
>>> +            case NETDEV_DOWN:
>>> +            case NETDEV_UNREGISTER:
>>> +                nh->nh_flags |= RTNH_F_DEAD;
>>> +                /* fall through */
>>> +            case NETDEV_CHANGE:
>>> +                nh->nh_flags |= RTNH_F_LINKDOWN;
>>> +                rt->rt_nhn_alive--;
>>
>> Are these operations atomic on all architectures?
>
> I think so.

Ugh.  I don't see how.
A) You are doing read-modify-write. 
B) You are modifying multiple fields.

So while the individual field writes may be atomic the changes certainly
are not atomic.

>> Even if they are, I'm a bit worried about the RCU-correctness of this. I think the fallthrough case in mpls_select_multipath saves you, causing traffic to get via nh0 instead of the one selected by the hash (which is fine transiently).
>>

I share your concern.  In-place modification sounds good in principle
for handling RCU, but there is a reason why the original version of
RCU was called Read-Copy-Update.   Given that there are multiple fields
that seem to depend upon each other I am not certain we can perform rcu
safe modifications without creating a fresh copy of the route.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html