linux-kernel - Re: Coccinelle rule for CVE-2019-18683

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e41fc912-0a4f-70c3-b924-50126f0f185a@linux.com>
Date:   Thu, 9 Apr 2020 22:41:01 +0300
From:   Alexander Popov <alex.popov@...ux.com>
To:     Jann Horn <jannh@...gle.com>
Cc:     Julia Lawall <Julia.Lawall@...6.fr>,
        Gilles Muller <Gilles.Muller@...6.fr>,
        Nicolas Palix <nicolas.palix@...g.fr>,
        Michal Marek <michal.lkml@...kovi.net>, cocci@...teme.lip6.fr,
        "kernel-hardening@...ts.openwall.com" 
        <kernel-hardening@...ts.openwall.com>,
        Kees Cook <keescook@...omium.org>,
        Hans Verkuil <hverkuil@...all.nl>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Linux Media Mailing List <linux-media@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Markus Elfring <Markus.Elfring@....de>
Subject: Re: Coccinelle rule for CVE-2019-18683

Jann, thanks for your reply!

On 09.04.2020 01:26, Jann Horn wrote:
> On Thu, Apr 9, 2020 at 12:01 AM Alexander Popov <alex.popov@...ux.com> wrote:
>> CVE-2019-18683 refers to three similar vulnerabilities caused by the same
>> incorrect approach to locking that is used in vivid_stop_generating_vid_cap(),
>> vivid_stop_generating_vid_out(), and sdr_cap_stop_streaming().
>>
>> For fixes please see the commit 6dcd5d7a7a29c1e4 (media: vivid: Fix wrong
>> locking that causes race conditions on streaming stop).
>>
>> These three functions are called during streaming stopping with vivid_dev.mutex
>> locked. And they all do the same mistake while stopping their kthreads, which
>> need to lock this mutex as well. See the example from
>> vivid_stop_generating_vid_cap():
>>     /* shutdown control thread */
>>     vivid_grab_controls(dev, false);
>>     mutex_unlock(&dev->mutex);
>>     kthread_stop(dev->kthread_vid_cap);
>>     dev->kthread_vid_cap = NULL;
>>     mutex_lock(&dev->mutex);
>>
>> But when this mutex is unlocked, another vb2_fop_read() can lock it instead of
>> the kthread and manipulate the buffer queue. That causes use-after-free.
>>
>> I created a Coccinelle rule that detects mutex_unlock+kthread_stop+mutex_lock
>> within one function.
> [...]
>> mutex_unlock@...ock_p(E)
>> ...
>> kthread_stop@...p_p(...)
>> ...
>> mutex_lock@...k_p(E)
> 
> Is the kthread_stop() really special here? It seems to me like it's
> pretty much just a normal instance of the "temporarily dropping a
> lock" pattern - which does tend to go wrong quite often, but can also
> be correct.

Right, searching without kthread_stop() gives more cases.

> I think it would be interesting though to have a list of places that
> drop and then re-acquire a mutex/spinlock/... that was not originally
> acquired in the same block of code (but was instead originally
> acquired in an outer block, or by a parent function, or something like
> that). So things like this:

It's a very good idea. I tried it and got first results (described below).

> void X(...) {
>   mutex_lock(A);
>   for (...) {
>     ...
>     mutex_unlock(A);
>     ...
>     mutex_lock(A);
>     ...
>   }
>   mutex_unlock(A);
> }

I'm not an expert in SmPL yet. Don't know how to describe this case.

> or like this:
> 
> void X(...) {
>   ... [no mutex operations on A]
>   mutex_unlock(A);
>   ...
>   mutex_lock(A);
>   ...
> }

Yes, I adapted the rule for that easier case:

```
virtual report
virtual context

@race exists@
expression E;
position unlock_p;
position lock_p;
@@

... when != mutex_lock(E)
* mutex_unlock@...ock_p(E)
...
* mutex_lock@...k_p(E)

@script:python@
unlock_p << race.unlock_p;
lock_p << race.lock_p;
E << race.E;
@@

coccilib.report.print_report(unlock_p[0], 'see mutex_unlock(' + E + ') here')
coccilib.report.print_report(lock_p[0], 'see mutex_lock(' + E + ') here\n')
```

The command to run it:
  COCCI=./scripts/coccinelle/kthread_race.cocci make coccicheck MODE=context
It shows the code context around in a form of diff.

This rule found 195 matches. Not that much!

> But of course, there are places where this kind of behavior is
> correct; so such a script wouldn't just return report code, just code
> that could use a bit more scrutiny than normal. 

I've spent some time looking through the results.
Currently I see 3 types of cases.


1. Cases that look legit: a mutex is unlocked for some waiting or sleeping.

Example:
./fs/io_uring.c:7908:2-14: see mutex_unlock(& ctx -> uring_lock) here
./fs/io_uring.c:7910:2-12: see mutex_lock(& ctx -> uring_lock) here

diff -u -p ./fs/io_uring.c /tmp/nothing/fs/io_uring.c
--- ./fs/io_uring.c
+++ /tmp/nothing/fs/io_uring.c
@@ -7905,9 +7905,7 @@ static int __io_uring_register(struct io
 		 * to drop the mutex here, since no new references will come in
 		 * after we've killed the percpu ref.
 		 */
-		mutex_unlock(&ctx->uring_lock);
 		ret = wait_for_completion_interruptible(&ctx->completions[0]);
-		mutex_lock(&ctx->uring_lock);
 		if (ret) {
 			percpu_ref_resurrect(&ctx->refs);
 			ret = -EINTR;


Another example that looks legit:
./mm/ksm.c:2709:2-14: see mutex_unlock(& ksm_thread_mutex) here
./mm/ksm.c:2712:2-12: see mutex_lock(& ksm_thread_mutex) here

diff -u -p ./mm/ksm.c /tmp/nothing/mm/ksm.c
--- ./mm/ksm.c
+++ /tmp/nothing/mm/ksm.c
@@ -2706,10 +2706,8 @@ void ksm_migrate_page(struct page *newpa
 static void wait_while_offlining(void)
 {
 	while (ksm_run & KSM_RUN_OFFLINE) {
-		mutex_unlock(&ksm_thread_mutex);
 		wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE),
 			    TASK_UNINTERRUPTIBLE);
-		mutex_lock(&ksm_thread_mutex);
 	}
 }


2. Weird cases that look like just avoiding a deadlock.

Example. This mutex is unlocked for a while by an interrupt handler:
./sound/pci/pcxhr/pcxhr_core.c:1210:3-15: see mutex_unlock(& mgr -> lock) here
./sound/pci/pcxhr/pcxhr_core.c:1212:3-13: see mutex_lock(& mgr -> lock) here

diff -u -p ./sound/pci/pcxhr/pcxhr_core.c /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
--- ./sound/pci/pcxhr/pcxhr_core.c
+++ /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
@@ -1207,9 +1207,7 @@ static void pcxhr_update_timer_pos(struc
 		}

 		if (elapsed) {
-			mutex_unlock(&mgr->lock);
 			snd_pcm_period_elapsed(stream->substream);
-			mutex_lock(&mgr->lock);
 		}
 	}
 }

Another weird example. Looks a bit similar to V4L2 bugs.

./drivers/net/wireless/broadcom/b43/main.c:4334:1-13: see mutex_unlock(& wl ->
mutex) here
./drivers/net/wireless/broadcom/b43/main.c:4338:1-11: see mutex_lock(& wl ->
mutex) here

diff -u -p ./drivers/net/wireless/broadcom/b43/main.c
/tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
--- ./drivers/net/wireless/broadcom/b43/main.c
+++ /tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
@@ -4331,11 +4331,9 @@ redo:
 		return dev;

 	/* Cancel work. Unlock to avoid deadlocks. */
-	mutex_unlock(&wl->mutex);
 	cancel_delayed_work_sync(&dev->periodic_work);
 	cancel_work_sync(&wl->tx_work);
 	b43_leds_stop(dev);
-	mutex_lock(&wl->mutex);
 	dev = wl->current_dev;
 	if (!dev || b43_status(dev) < B43_STAT_STARTED) {
 		/* Whoops, aliens ate up the device while we were unlocked. */


3. False positive cases.
The pointer to mutex changes between unlocking and locking.

Example:
./fs/ceph/caps.c:2103:4-16: see mutex_unlock(& session -> s_mutex) here
./fs/ceph/caps.c:2105:3-13: see mutex_lock(& session -> s_mutex) here

@@ -2100,9 +2094,7 @@ retry_locked:
 		if (session != cap->session) {
 			spin_unlock(&ci->i_ceph_lock);
 			if (session)
-				mutex_unlock(&session->s_mutex);
 			session = cap->session;
-			mutex_lock(&session->s_mutex);
 			goto retry;
 		}
 		if (cap->session->s_state < CEPH_MDS_SESSION_OPEN) {


I would be grateful for your ideas and feedback.
Alexander