linux-kernel - Re: [PATCH v2 bpf-next 5/5] bpf: trampoline: support FTRACE_OPS_FL_SHARE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220706211858.67f9254d@rorschach.local.home>
Date:   Wed, 6 Jul 2022 21:18:58 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Song Liu <songliubraving@...com>
Cc:     Song Liu <song@...nel.org>, Networking <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, lkml <linux-kernel@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Kernel Team <Kernel-team@...com>,
        "jolsa@...nel.org" <jolsa@...nel.org>,
        "mhiramat@...nel.org" <mhiramat@...nel.org>
Subject: Re: [PATCH v2 bpf-next 5/5] bpf: trampoline: support
 FTRACE_OPS_FL_SHARE_IPMODIFY

On Thu, 7 Jul 2022 00:19:07 +0000
Song Liu <songliubraving@...com> wrote:

> >> In this specific race condition, register_bpf() will succeed, as it already
> >> got tr->mutex. But the IPMODIFY (livepatch) side will fail and retry.   
> > 
> > What else takes the tr->mutex ?  
> 
> tr->mutex is the local mutex for a single BPF trampoline, we only need to take
> it when we make changes to the trampoline (add/remove fentry/fexit programs). 
> 
> > 
> > If it preempts anything else taking that mutex, when this runs, then it
> > needs to be careful.
> > 
> > You said this can happen when the live patch came first. This isn't racing
> > against live patch, it's racing against anything that takes the tr->mutex
> > and then adds a bpf trampoline to a location that has a live patch.  
> 
> There are a few scenarios here:
> 1. Live patch is already applied, then a BPF trampoline is being registered 
> to the same function. In bpf_trampoline_update(), register_fentry returns
> -EAGAIN, and this will be resolved. 

Where will it be resolved?

> 
> 2. BPF trampoline is already registered, then a live patch is being applied 
> to the same function. The live patch code need to ask the bpf trampoline to
> prepare the trampoline before live patch. This is done by calling 
> bpf_tramp_ftrace_ops_func. 
> 
> 2.1 If nothing else is modifying the trampoline at the same time, 
> bpf_tramp_ftrace_ops_func will succeed. 
> 
> 2.2 In rare cases, if something else is holding tr->mutex to make changes to 
> the trampoline (add/remove fentry functions, etc.), mutex_trylock in 
> bpf_tramp_ftrace_ops_func will fail, and live patch will fail. However, the 
> change to BPF trampoline will still succeed. It is common for live patch to
> retry, so we just need to try live patch again when no one is making changes 
> to the BPF trampoline in parallel. 

If the live patch is going to try again, and the task doing the live
patch is SCHED_FIFO, and the task holding the tr->mutex is SCHED_OTHER
(or just a lower priority), then there is a chance that the live patch
task preempted the tr->mutex owner, and let's say the tr->mutex owner
is pinned to the CPU (by the user or whatever), then because the live
patch task is in a loop trying to take that mutex, it will never let
the owner continue.

Yes, this is a real scenario with trylock on mutexes. We hit it all the
time in RT.

> 
> >   
> >> 
> >> Since both livepatch and bpf trampoline changes are rare operations, I think 
> >> the chance of the race condition is low enough. 


A low race condition in a world that does this a billion times a day,
ends up being not so rare.

I like to say, "I live in a world where the unlikely is very much likely!"


> >> 
> >> Does this make sense?
> >>   
> > 
> > It's low, and if it is also a privileged operation then there's less to be
> > concern about.  
> 
> Both live patch and BPF trampoline are privileged operations. 

This makes the issue less of an issue, but if there's an application
that does this with setuid or something, there's a chance that it can
be used by an attacker as well.

> 
> > As if it is not, then we could have a way to deadlock the
> > system. I'm more concerned that this will lead to a CVE than it just
> > happening randomly. In other words, it only takes something that can run at
> > a real-time priority to connect to a live patch location, and something
> > that runs at a low priority to take a tr->mutex. If an attacker has both,
> > then it can pin both to a CPU and then cause the deadlock to the system.
> > 
> > One hack to fix this is to add a msleep(1) in the failed case of the
> > trylock. This will at least give the owner of the lock a millisecond to
> > release it. This was what the RT patch use to do with spin_trylock() that
> > was converted to a mutex (and we worked hard to remove all of them).  
> 
> The fix is really simple. But I still think we don't need it. We only hit
> the trylock case for something with IPMODIFY. The non-privileged user 
> should not be able to do that, right?

For now, perhaps. But what useful applications are there going to be in
the future that performs these privileged operations on behalf of a
non-privileged user?

In other words, if we can fix it now, we should, and avoid debugging
this issue 5 years from now where it may take months to figure out.

-- Steve