netdev - Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce global BPF flow dissector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYbJZz7AwW_N=Q2b-V8ZQCJVTHeUaGo6Ji848aB_z8nXA@mail.gmail.com>
Date:   Thu, 3 Oct 2019 09:26:01 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Stanislav Fomichev <sdf@...ichev.me>
Cc:     Stanislav Fomichev <sdf@...gle.com>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Petar Penkov <ppenkov@...gle.com>
Subject: Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce
 global BPF flow dissector

On Thu, Oct 3, 2019 at 9:01 AM Stanislav Fomichev <sdf@...ichev.me> wrote:
>
> On 10/02, Andrii Nakryiko wrote:
> > On Wed, Oct 2, 2019 at 6:43 PM Stanislav Fomichev <sdf@...ichev.me> wrote:
> > >
> > > On 10/02, Andrii Nakryiko wrote:
> > > > On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > >
> > > > > Always use init_net flow dissector BPF program if it's attached and fall
> > > > > back to the per-net namespace one. Also, deny installing new programs if
> > > > > there is already one attached to the root namespace.
> > > > > Users can still detach their BPF programs, but can't attach any
> > > > > new ones (-EPERM).
> >
> > I find this quite confusing for users, honestly. If there is no root
> > namespace dissector we'll successfully attach per-net ones and they
> > will be working fine. That some process will attach root one and all
> > the previously successfully working ones will suddenly "break" without
> > users potentially not realizing why. I bet this will be hair-pulling
> > investigation for someone. Furthermore, if root net dissector is
> > already attached, all subsequent attachment will now start failing.
> The idea is that if sysadmin decides to use system-wide dissector it would
> be attached from the init scripts/systemd early in the boot process.
> So the users in your example would always get EPERM/EBUSY/EXIST.
> I don't really see a realistic use-case where root and non-root
> namespaces attach/detach flow dissector programs at non-boot
> time (or why non-root containers could have BPF dissector and root
> could have C dissector; multi-nic machine?).
>
> But I totally see your point about confusion. See below.
>
> > I'm not sure what's the better behavior here is, but maybe at least
> > forcibly detach already attached ones, so when someone goes and tries
> > to investigate, they will see that their BPF program is not attached
> > anymore. Printing dmesg warning would be hugely useful here as well.
> We can do for_each_net and detach non-root ones; that sounds
> feasible and may avoid the confusion (at least when you query
> non-root ns to see if the prog is still there, you get a valid
> indication that it's not).
>
> > Alternatively, if there is any per-net dissector attached, we might
> > disallow root net dissector to be installed. Sort of "too late to the
> > party" way, but at least not surprising to successfully installed
> > dissectors.
> We can do this as well.
>
> > Thoughts?
> Let me try to implement both of your suggestions and see which one makes
> more sense. I'm leaning towards the later (simple check to see if
> any non-root ns has the prog attached).
>
> I'll follow up with a v2 if all goes well.

Thanks! I don't have strong opinion on either, see what makes most
sense from an actual user perspective.

>
> > > > > Cc: Petar Penkov <ppenkov@...gle.com>
> > > > > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > > > > ---
> > > > >  Documentation/bpf/prog_flow_dissector.rst |  3 +++
> > > > >  net/core/flow_dissector.c                 | 11 ++++++++++-
> > > > >  2 files changed, 13 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/Documentation/bpf/prog_flow_dissector.rst b/Documentation/bpf/prog_flow_dissector.rst
> > > > > index a78bf036cadd..4d86780ab0f1 100644
> > > > > --- a/Documentation/bpf/prog_flow_dissector.rst
> > > > > +++ b/Documentation/bpf/prog_flow_dissector.rst
> > > > > @@ -142,3 +142,6 @@ BPF flow dissector doesn't support exporting all the metadata that in-kernel
> > > > >  C-based implementation can export. Notable example is single VLAN (802.1Q)
> > > > >  and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
> > > > >  for a set of information that's currently can be exported from the BPF context.
> > > > > +
> > > > > +When BPF flow dissector is attached to the root network namespace (machine-wide
> > > > > +policy), users can't override it in their child network namespaces.
> > > > > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > > > > index 7c09d87d3269..494e2016fe84 100644
> > > > > --- a/net/core/flow_dissector.c
> > > > > +++ b/net/core/flow_dissector.c
> > > > > @@ -115,6 +115,11 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
> > > > >         struct bpf_prog *attached;
> > > > >         struct net *net;
> > > > >
> > > > > +       if (rcu_access_pointer(init_net.flow_dissector_prog)) {
> > > > > +               /* Can't override root flow dissector program */
> > > > > +               return -EPERM;
> > > > > +       }
> > > >
> > > > This is racy, shouldn't this be checked after grabbing a lock below?
> > > What kind of race do you have in mind?
> >
> > I was thinking about the case of two competing attaches for root
> > init_net, but it seems like we will double-check again under lock, so
> > this is fine as is.
> >
> > >
> > > Even if I put this check under the mutex, it's still possible that if
> > > two cpus concurrently start attaching flow dissector programs (i.e. call
> > > sys_bpf(BPF_PROG_ATTACH)) at the same time (one to root ns, the other
> > > to non-root ns), the cpu that is attaching to non-root can grab mutex first,
> > > pass all the checks and attach the prog (higher frequency, tubo boost, etc).
> > >
> > > The mutex is there to protect only against concurrent attaches to the
> > > _same_ netns. For the sake of simplicity we have a global one instead
> > > of a mutex per net-ns.
> > >
> > > So I'd rather not grab the mutex and keep it simple. Even in there is a
> > > race, in __skb_flow_dissect we always check init_net first.
> > >
> > > > > +
> > > > >         net = current->nsproxy->net_ns;
> > > > >         mutex_lock(&flow_dissector_mutex);
> > > > >         attached = rcu_dereference_protected(net->flow_dissector_prog,
> > > > > @@ -910,7 +915,11 @@ bool __skb_flow_dissect(const struct net *net,
> > > > >         WARN_ON_ONCE(!net);
> > > > >         if (net) {
> > > > >                 rcu_read_lock();
> > > > > -               attached = rcu_dereference(net->flow_dissector_prog);
> > > > > +               attached =
> > > > > +                       rcu_dereference(init_net.flow_dissector_prog);
> > > > > +
> > > > > +               if (!attached)
> > > > > +                       attached = rcu_dereference(net->flow_dissector_prog);
> > > > >
> > > > >                 if (attached) {
> > > > >                         struct bpf_flow_keys flow_keys;
> > > > > --
> > > > > 2.23.0.444.g18eeb5a265-goog
> > > > >