netdev - Re: BUG: potential net namespace bug in IPv6 flow label management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSc77mc6kwRpA4pvbyK-y5MdaJLkvWMqgXSohGp9XJFibw@mail.gmail.com>
Date:   Sun, 13 Feb 2022 19:48:21 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     "Liu, Congyu" <liu3101@...due.edu>,
        "security@...nel.org" <security@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: BUG: potential net namespace bug in IPv6 flow label management

On Sun, Feb 13, 2022 at 6:47 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Sun, Feb 13, 2022 at 11:10 AM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > On Sun, Feb 13, 2022 at 5:31 AM Liu, Congyu <liu3101@...due.edu> wrote:
> > >
> > >
> > > Hi,
> > >
> > > In the test conducted on namespace, I found that one unsuccessful IPv6 flow label
> > > management from one net ns could stop other net ns's data transmission that requests
> > > flow label for a short time. Specifically, in our test case, one unsuccessful
> > > `setsockopt` to get flow label will affect other net ns's `sendmsg` with flow label
> > > set in cmsg. Simple PoC is included for verification. The behavior descirbed above
> > > can be reproduced in latest kernel.
> > >
> > > I managed to figure out the data flow behind this: when asking to get a flow label,
> > > some `setsockopt` parameters can trigger function `ipv6_flowlabel_get` to call `fl_create`
> > > to allocate an exclusive flow label, then call `fl_release` to release it before returning
> > > -ENOENT. Global variable `ipv6_flowlabel_exclusive`, a rate limit jump label that keeps
> > > track of number of alive exclusive flow labels, will get increased instantly after calling
> > > `fl_create`. Due to its rate limit design, `ipv6_flowlabel_exclusive` can only decrease
> > > sometime later after calling `fl_decrease`. During this period, if data transmission function
> > > in other net ns (e.g. `udpv6_sendmsg`) calls `fl_lookup`, the false `ipv6_flowlabel_exclusive`
> > > will invoke the `__fl_lookup`. In the test case observed, this function returns error and
> > > eventually stops the data transmission.
> > >
> > > I further noticed that this bug could somehow be vulnerable: if `setsockopt` is called
> > > continuously, then `sendmmsg` call from other net ns will be blocked forever. Using the PoC
> > > provided, if attack and victim programs are running simutaneously, victim program cannot transmit
> > > data; when running without attack program, the victim program can transmit data normally.
> >
> > Thanks for the clear explanation.
> >
> > Being able to use flowlabels without explicitly registering them
> > through a setsockopt is a fast path optimization introduced in commit
> > 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases
> > exist").
> >
> > Before this, any use of flowlabels required registering them, whether
> > the use was exclusive or not. As autoflowlabels already skipped this
> > stateful action, the commit extended this fast path to all non-exclusive
> > use. But if any exclusive flowlabel is active, to protect it, all
> > other flowlabel use has to be registered too.
> >
> > The commit message does state
> >
> >     This is an optimization. Robust applications still have to revert to
> >     requesting leases if the fast path fails due to an exclusive lease.
> >
> > Though I can see how the changed behavior has changed the perception of the API.
> >
> > That this extends up to a second after release of the last exclusive
> > flowlabel due to deferred release is only tangential to the issue?
> >
> > Flowlabels are stored globally, but associated with a netns
> > (fl->fl_net). Perhaps we can add a per-netns check to the
> > static_branch and maintain stateless behavior in other netns, even if
> > some netns maintain exclusive leases.
>
> The specific issue could be avoided by moving
>
>        if (fl_shared_exclusive(fl) || fl->opt)
>                static_branch_deferred_inc(&ipv6_flowlabel_exclusive);
>
> until later in ipv6_flowlabel_get, after the ENOENT response.
>
> But reserving a flowlabel is not a privileged operation, including for
> exclusive use. So the attack program can just be revised to pass
> IPV6_FL_F_CREATE and hold a real reservation. Then it also does
> not have to retry in a loop.
>
> The drop behavior is fully under control of the victim. If it reserves
> the flowlabel it intends to use, then the issue does not occur. For
> this reason I don't see this as a vulnerability.
>
> But the behavior is non-obvious and it is preferable to isolate netns
> from each other. I'm looking into whether we can add a per-netns
> "has exclusive leases" check.

Easiest is just to mark the netns as requiring the check only once it
starts having exclusive labels:

+++ b/include/net/ipv6.h
@@ -399,7 +399,8 @@ extern struct static_key_false_deferred
ipv6_flowlabel_exclusive;
 static inline struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk,
                                                    __be32 label)
 {
-       if (static_branch_unlikely(&ipv6_flowlabel_exclusive.key))
+       if (static_branch_unlikely(&ipv6_flowlabel_exclusive.key) &&
+           READ_ONCE(sock_net(sk)->ipv6.flowlabel_has_excl))
                return __fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);

@@ -77,9 +77,10 @@ struct netns_ipv6 {
        spinlock_t              fib6_gc_lock;
        unsigned int             ip6_rt_gc_expire;
        unsigned long            ip6_rt_last_gc;
+       unsigned char           flowlabel_has_excl;
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
-       unsigned int            fib6_rules_require_fldissect;
        bool                    fib6_has_custom_rules;
+       unsigned int            fib6_rules_require_fldissect;

+++ b/net/ipv6/ip6_flowlabel.c
@@ -450,8 +450,10 @@ fl_create(struct net *net, struct sock *sk,
struct in6_flowlabel_req *freq,
                err = -EINVAL;
                goto done;
        }
-       if (fl_shared_exclusive(fl) || fl->opt)
+       if (fl_shared_exclusive(fl) || fl->opt) {
+               WRITE_ONCE(sock_net(sk)->ipv6.flowlabel_has_excl, 1);
                static_branch_deferred_inc(&ipv6_flowlabel_exclusive);
+       }
        return fl;

Clearing flowlabel_has_excl when it stops using labels is more complex,
requiring either an atomic_t or walking the entire flowlabel hashtable on
each flowlabel free in the namespace. It can be skipped.