[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e41fc912-0a4f-70c3-b924-50126f0f185a@linux.com>
Date: Thu, 9 Apr 2020 22:41:01 +0300
From: Alexander Popov <alex.popov@...ux.com>
To: Jann Horn <jannh@...gle.com>
Cc: Julia Lawall <Julia.Lawall@...6.fr>,
Gilles Muller <Gilles.Muller@...6.fr>,
Nicolas Palix <nicolas.palix@...g.fr>,
Michal Marek <michal.lkml@...kovi.net>, cocci@...teme.lip6.fr,
"kernel-hardening@...ts.openwall.com"
<kernel-hardening@...ts.openwall.com>,
Kees Cook <keescook@...omium.org>,
Hans Verkuil <hverkuil@...all.nl>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Linux Media Mailing List <linux-media@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Markus Elfring <Markus.Elfring@....de>
Subject: Re: Coccinelle rule for CVE-2019-18683
Jann, thanks for your reply!
On 09.04.2020 01:26, Jann Horn wrote:
> On Thu, Apr 9, 2020 at 12:01 AM Alexander Popov <alex.popov@...ux.com> wrote:
>> CVE-2019-18683 refers to three similar vulnerabilities caused by the same
>> incorrect approach to locking that is used in vivid_stop_generating_vid_cap(),
>> vivid_stop_generating_vid_out(), and sdr_cap_stop_streaming().
>>
>> For fixes please see the commit 6dcd5d7a7a29c1e4 (media: vivid: Fix wrong
>> locking that causes race conditions on streaming stop).
>>
>> These three functions are called during streaming stopping with vivid_dev.mutex
>> locked. And they all do the same mistake while stopping their kthreads, which
>> need to lock this mutex as well. See the example from
>> vivid_stop_generating_vid_cap():
>> /* shutdown control thread */
>> vivid_grab_controls(dev, false);
>> mutex_unlock(&dev->mutex);
>> kthread_stop(dev->kthread_vid_cap);
>> dev->kthread_vid_cap = NULL;
>> mutex_lock(&dev->mutex);
>>
>> But when this mutex is unlocked, another vb2_fop_read() can lock it instead of
>> the kthread and manipulate the buffer queue. That causes use-after-free.
>>
>> I created a Coccinelle rule that detects mutex_unlock+kthread_stop+mutex_lock
>> within one function.
> [...]
>> mutex_unlock@...ock_p(E)
>> ...
>> kthread_stop@...p_p(...)
>> ...
>> mutex_lock@...k_p(E)
>
> Is the kthread_stop() really special here? It seems to me like it's
> pretty much just a normal instance of the "temporarily dropping a
> lock" pattern - which does tend to go wrong quite often, but can also
> be correct.
Right, searching without kthread_stop() gives more cases.
> I think it would be interesting though to have a list of places that
> drop and then re-acquire a mutex/spinlock/... that was not originally
> acquired in the same block of code (but was instead originally
> acquired in an outer block, or by a parent function, or something like
> that). So things like this:
It's a very good idea. I tried it and got first results (described below).
> void X(...) {
> mutex_lock(A);
> for (...) {
> ...
> mutex_unlock(A);
> ...
> mutex_lock(A);
> ...
> }
> mutex_unlock(A);
> }
I'm not an expert in SmPL yet. Don't know how to describe this case.
> or like this:
>
> void X(...) {
> ... [no mutex operations on A]
> mutex_unlock(A);
> ...
> mutex_lock(A);
> ...
> }
Yes, I adapted the rule for that easier case:
```
virtual report
virtual context
@race exists@
expression E;
position unlock_p;
position lock_p;
@@
... when != mutex_lock(E)
* mutex_unlock@...ock_p(E)
...
* mutex_lock@...k_p(E)
@script:python@
unlock_p << race.unlock_p;
lock_p << race.lock_p;
E << race.E;
@@
coccilib.report.print_report(unlock_p[0], 'see mutex_unlock(' + E + ') here')
coccilib.report.print_report(lock_p[0], 'see mutex_lock(' + E + ') here\n')
```
The command to run it:
COCCI=./scripts/coccinelle/kthread_race.cocci make coccicheck MODE=context
It shows the code context around in a form of diff.
This rule found 195 matches. Not that much!
> But of course, there are places where this kind of behavior is
> correct; so such a script wouldn't just return report code, just code
> that could use a bit more scrutiny than normal.
I've spent some time looking through the results.
Currently I see 3 types of cases.
1. Cases that look legit: a mutex is unlocked for some waiting or sleeping.
Example:
./fs/io_uring.c:7908:2-14: see mutex_unlock(& ctx -> uring_lock) here
./fs/io_uring.c:7910:2-12: see mutex_lock(& ctx -> uring_lock) here
diff -u -p ./fs/io_uring.c /tmp/nothing/fs/io_uring.c
--- ./fs/io_uring.c
+++ /tmp/nothing/fs/io_uring.c
@@ -7905,9 +7905,7 @@ static int __io_uring_register(struct io
* to drop the mutex here, since no new references will come in
* after we've killed the percpu ref.
*/
- mutex_unlock(&ctx->uring_lock);
ret = wait_for_completion_interruptible(&ctx->completions[0]);
- mutex_lock(&ctx->uring_lock);
if (ret) {
percpu_ref_resurrect(&ctx->refs);
ret = -EINTR;
Another example that looks legit:
./mm/ksm.c:2709:2-14: see mutex_unlock(& ksm_thread_mutex) here
./mm/ksm.c:2712:2-12: see mutex_lock(& ksm_thread_mutex) here
diff -u -p ./mm/ksm.c /tmp/nothing/mm/ksm.c
--- ./mm/ksm.c
+++ /tmp/nothing/mm/ksm.c
@@ -2706,10 +2706,8 @@ void ksm_migrate_page(struct page *newpa
static void wait_while_offlining(void)
{
while (ksm_run & KSM_RUN_OFFLINE) {
- mutex_unlock(&ksm_thread_mutex);
wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE),
TASK_UNINTERRUPTIBLE);
- mutex_lock(&ksm_thread_mutex);
}
}
2. Weird cases that look like just avoiding a deadlock.
Example. This mutex is unlocked for a while by an interrupt handler:
./sound/pci/pcxhr/pcxhr_core.c:1210:3-15: see mutex_unlock(& mgr -> lock) here
./sound/pci/pcxhr/pcxhr_core.c:1212:3-13: see mutex_lock(& mgr -> lock) here
diff -u -p ./sound/pci/pcxhr/pcxhr_core.c /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
--- ./sound/pci/pcxhr/pcxhr_core.c
+++ /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
@@ -1207,9 +1207,7 @@ static void pcxhr_update_timer_pos(struc
}
if (elapsed) {
- mutex_unlock(&mgr->lock);
snd_pcm_period_elapsed(stream->substream);
- mutex_lock(&mgr->lock);
}
}
}
Another weird example. Looks a bit similar to V4L2 bugs.
./drivers/net/wireless/broadcom/b43/main.c:4334:1-13: see mutex_unlock(& wl ->
mutex) here
./drivers/net/wireless/broadcom/b43/main.c:4338:1-11: see mutex_lock(& wl ->
mutex) here
diff -u -p ./drivers/net/wireless/broadcom/b43/main.c
/tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
--- ./drivers/net/wireless/broadcom/b43/main.c
+++ /tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
@@ -4331,11 +4331,9 @@ redo:
return dev;
/* Cancel work. Unlock to avoid deadlocks. */
- mutex_unlock(&wl->mutex);
cancel_delayed_work_sync(&dev->periodic_work);
cancel_work_sync(&wl->tx_work);
b43_leds_stop(dev);
- mutex_lock(&wl->mutex);
dev = wl->current_dev;
if (!dev || b43_status(dev) < B43_STAT_STARTED) {
/* Whoops, aliens ate up the device while we were unlocked. */
3. False positive cases.
The pointer to mutex changes between unlocking and locking.
Example:
./fs/ceph/caps.c:2103:4-16: see mutex_unlock(& session -> s_mutex) here
./fs/ceph/caps.c:2105:3-13: see mutex_lock(& session -> s_mutex) here
@@ -2100,9 +2094,7 @@ retry_locked:
if (session != cap->session) {
spin_unlock(&ci->i_ceph_lock);
if (session)
- mutex_unlock(&session->s_mutex);
session = cap->session;
- mutex_lock(&session->s_mutex);
goto retry;
}
if (cap->session->s_state < CEPH_MDS_SESSION_OPEN) {
I would be grateful for your ideas and feedback.
Alexander
Powered by blists - more mailing lists