netdev - Re: [PATCH net] ipv4: fix route lookups when handling ICMP redirects and PMTU updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220301114107.GB24680@debian.home>
Date:   Tue, 1 Mar 2022 12:41:07 +0100
From:   Guillaume Nault <gnault@...hat.com>
To:     David Ahern <dsahern@...nel.org>
Cc:     David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>
Subject: Re: [PATCH net] ipv4: fix route lookups when handling ICMP redirects
 and PMTU updates

On Mon, Feb 28, 2022 at 09:31:09PM -0700, David Ahern wrote:
> On 2/28/22 1:54 PM, Guillaume Nault wrote:
> > On Mon, Feb 28, 2022 at 10:31:58AM -0700, David Ahern wrote:
> >> On 2/28/22 10:16 AM, Guillaume Nault wrote:
> >>> Fixes: d3a25c980fc2 ("ipv4: Fix nexthop exception hash computation.")
> >>
> >> That does not seem related to tos in the flow struct at all.
> > 
> > Ouch, copy/paste mistake.
> > I meant 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), which is
> > the next commit with 'git log -- net/ipv4/route.c'.
> > Really sorry :/, and thanks a lot for catching that!
> > 
> >>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> >>> index f33ad1f383b6..d5d058de3664 100644
> >>> --- a/net/ipv4/route.c
> >>> +++ b/net/ipv4/route.c
> >>> @@ -499,6 +499,15 @@ void __ip_select_ident(struct net *net, struct iphdr *iph, int segs)
> >>>  }
> >>>  EXPORT_SYMBOL(__ip_select_ident);
> >>>  
> >>> +static void ip_rt_fix_tos(struct flowi4 *fl4)
> >>
> >> make this a static inline in include/net/flow.h and update
> >> flowi4_init_output and flowi4_update_output to use it. That should cover
> >> a few of the cases below leaving just  ...
> > 
> > Hum, I didn't think about this option, but it looks risky to me. As I
> > put it in note 1, ip_route_output_key_hash() unconditionally sets
> > ->flowi4_scope, assuming it can infer the scope from the RTO_ONLINK bit
> > of ->flowi4_tos. If we santise these fields in flowi4_init_output()
> > (and flowi4_update_output()), then ip_route_output_key_hash() would
> > sometimes work on already santised values and sometimes not. So it
> > wouldn't know if it should initialise ->flowi4_scope.
> > 
> > We could decide to let ip_route_output_key_hash() initialise
> > ->flowi4_scope only when the RTO_ONLINK bit is set, which
> > guarantees that we don't have sanitised values. But before that, we'd
> > need to audit all other callers, to verify that they correctly
> > initialise the ->flowi4_scope with RT_SCOPE_UNIVERSE, since
> > ip_route_output_key_hash() isn't going do it for them anymore.
> > I'll audit all these callers, but that should be something for
> > net-next.
> 
> I'm not following the response. You are moving the tos logic from
> ip_route_output_key_hash to a helper and calling the new helper for
> other fib lookups. My suggestion was to correctly set / fixup the tos
> and scope when flowi4 is initialized (reducing the number of places the
> fixup is needed) and recognizing below that ip_route_output_key_hash
> still needs the call to the new ip_rt_fix_tos.

The problem is that we can't santitise fl4 twice:

    fl4->flowi4_tos = 0x04 | RTO_ONLINK;
    fl4->flowi4_scope = whatever;

    ip_rt_fix_tos(fl4);
    /* Now ->flowi4_tos == 0x04 and ->flowi4_scope == RT_SCOPE_LINK */

    ip_rt_fix_tos(fl4);
    /* Now ->flowi4_scope is wrongly changed to RT_SCOPE_UNIVERSE */

Therefore we can't call the helper in ip_route_output_key_hash() "just
in case", because that has to be done exactly once and we can't know
whether fl4 has already been sanitised or not.

The second part of my reply was about trying to allow double calls to
ip_rt_fix_tos() (as it's required for the solution you proposed). It
looks like all call paths initialise ->flowi4_scope to zero (that is,
RT_SCOPE_UNIVERSE). If that's really the case, then ip_rt_fix_tos()
could reset ->flowi4_scope only when RTO_ONLINK is on. Then we wouldn't
have to worry about the problem described above. But that requires
auditing all code paths to ensure that they all of properly initialise
the scope to RT_SCOPE_UNIVERSE, otherwise we risk introducing
regressions because of uninitialised ->flowi4_scope. So this kind of
work seems better suited for the net-next tree.

And my final point was that the need for ip_rt_fix_tos() is temporary:
I plan to do the call paths review anyway, to make them initialise tos
and scope properly, thus removing the need for RTO_ONLINK. I already
have a draft patch series, but as I said that's work for net-next.

> > 
> >>> @@ -2613,9 +2625,7 @@ struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
> >>>  	struct rtable *rth;
> >>>  
> >>>  	fl4->flowi4_iif = LOOPBACK_IFINDEX;
> >>> -	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> >>> -	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
> >>> -			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
> >>> +	ip_rt_fix_tos(fl4);
> >>
> >> ... this one to call the new helper.
> > 
> > BTW, here's a bit more about the context around this patch.
> > I found the problem while working on removing the use of RTO_ONLINK, so
> > that ->flowi4_tos could be converted to dscp_t.
> > 
> > The objective is to modify callers so that they'd set ->flowi4_scope
> > directly, instead using RTO_ONLINK to mark their intention (and that's
> > why I said I'd have to audit them anyway).
> > 
> > Once that will be done, ip_rt_fix_tos() won't have to touch the scope
> > anymore. And once ->flowi4_tos will be converted to dscp_t, we'll can
> > remove that function entirely since dscp_t ensures ECN bits are cleared
> > (IPTOS_RT_MASK also ensures that high order bits are cleared too, but
> > that's redundant with the RT_TOS() calls already done by callers, and
> > which somewhat aren't really desirable anyway).
> > 
> 
> 
>