lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG-UpRR74X8dnog1HcWpmkE4ctZcuaR2FvGZoqpuXMaGzrKNDg@mail.gmail.com>
Date:   Sat, 15 Jul 2023 01:12:53 +0800
From:   Henry Wu <triangletrap12@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org,
        Boqun Feng <boqun.feng@...il.com>,
        Mike Galbraith <efault@....de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH] locking/rtmutex: Fix task->pi_waiters integrity

Hi, all.

Peter Zijlstra <peterz@...radead.org> 于2023年7月8日周六 00:10写道:
>
> Henry reported that rt_mutex_adjust_prio_check() has an ordering
> problem and puts the lie to the comment in [7]. Sharing the sort key
> between lock->waiters and owner->pi_waiters *does* create problems,
> since unlike what the comment claims, holding [L] is insufficient.
>
> Notably, consider:
>
>         A
>       /   \
>      M1   M2
>      |     |
>      B     C
>
> That is, task A owns both M1 and M2, B and C block on them. In this
> case a concurrent chain walk (B & C) will modify their resp. sort keys
> in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
> is meaningless, they're different Ls.
>
> This then gives rise to a race condition between [7] and [11], where
> the requeue of pi_waiters will observe an inconsistent tree order.
>
>         B                               C
>
>   (holds M1->wait_lock,         (holds M2->wait_lock,
>    holds B->pi_lock)             holds A->pi_lock)
>
>   [7]
>   waiter_update_prio();
>   ...
>   [8]
>   raw_spin_unlock(B->pi_lock);
>   ...
>   [10]
>   raw_spin_lock(A->pi_lock);
>
>                                 [11]
>                                 rt_mutex_enqueue_pi();
>                                 // observes inconsistent A->pi_waiters
>                                 // tree order
>
> Fixing this means either extending the range of the owner lock from
> [10-13] to [6-13], with the immediate problem that this means [6-8]
> hold both blocked and owner locks, or duplicating the sort key.
>
> Since the locking in chain walk is horrible enough without having to
> consider pi_lock nesting rules, duplicate the sort key instead.
>
> By giving each tree their own sort key, the above race becomes
> harmless, if C sees B at the old location, then B will correct things
> (if they need correcting) when it walks up the chain and reaches A.
>
> Fixes: fb00aca47440 ("rtmutex: Turn the plist into an rb-tree")
> Reported-by: Henry Wu <triangletrap12@...il.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
>  kernel/locking/rtmutex.c        |  170 ++++++++++++++++++++++++++--------------
>  kernel/locking/rtmutex_api.c    |    2
>  kernel/locking/rtmutex_common.h |   47 ++++++++---
>  kernel/locking/ww_mutex.h       |   12 +-
>  4 files changed, 155 insertions(+), 76 deletions(-)
>

Sorry for late reply due to overlooked your patch.

I tested your patch on a 6 core machine with kernel 6.5.0-rc1. It
works well and my test program can't find any race on patched kernel.

I tested it with non-debug kernel for almost 3 hours and tested with
debug kernel for 6 hours. I checked my dmesg and there was no
irregular log. I checked my test program on kernel 6.3.12 that it can
reproduce the race we discussed previously.

You can find my test program in attachment and below is test output.
You can add "Tested-by" in commit msg.

Thanks

Henry

For non-debug kernel:

$ sudo cat /sys/kernel/debug/sched/preempt
none voluntary (full)
$ sudo ./a.out
pid: 23845
Before: prio: 20
    PID     LWP TTY          TIME CMD
  23845   23845 pts/0    00:00:00 a.out
  23845   23846 pts/0    00:00:00 waiter-0
  23845   23847 pts/0    00:00:00 waiter-1
  23845   23848 pts/0    00:00:00 waiter-2
  23845   23849 pts/0    00:00:00 waiter-3
  23845   23850 pts/0    00:00:00 waiter-4
  23845   23851 pts/0    00:00:00 waiter-5
  23845   23852 pts/0    00:00:00 waiter-6
  23845   23853 pts/0    00:00:00 waiter-7
  23845   23854 pts/0    00:00:00 waiter-8
  23845   23855 pts/0    00:00:00 waiter-9
  23845   23856 pts/0    00:00:00 waiter-10
  23845   23857 pts/0    00:00:00 waiter-11
  23845   23858 pts/0    00:00:00 waiter-12
  23845   23859 pts/0    00:00:00 waiter-13
  23845   23860 pts/0    00:00:00 waiter-14
  23845   23861 pts/0    00:00:00 waiter-15
  23845   23862 pts/0    00:00:00 waiter-16
  23845   23863 pts/0    00:00:00 waiter-17
  23845   23864 pts/0    00:00:00 waiter-18
  23845   23865 pts/0    00:00:00 waiter-19
  23845   23866 pts/0    00:00:00 changer-0
  23845   23867 pts/0    00:00:00 changer-1
  23845   23868 pts/0    00:00:00 changer-2
  23845   23869 pts/0    00:00:00 changer-3
  23845   23870 pts/0    00:00:00 changer-4
  23845   23871 pts/0    00:00:00 changer-5
  23845   23872 pts/0    00:00:00 changer-6
  23845   23873 pts/0    00:00:00 changer-7
  23845   23874 pts/0    00:00:00 changer-8
  23845   23875 pts/0    00:00:00 changer-9
  23845   23876 pts/0    00:00:00 changer-10
  23845   23877 pts/0    00:00:00 changer-11
  23845   23878 pts/0    00:00:00 changer-12
  23845   23879 pts/0    00:00:00 changer-13
  23845   23880 pts/0    00:00:00 changer-14
  23845   23881 pts/0    00:00:00 changer-15
  23845   23882 pts/0    00:00:00 changer-16
  23845   23883 pts/0    00:00:00 changer-17
  23845   23884 pts/0    00:00:00 changer-18
  23845   23885 pts/0    00:00:00 changer-19
Prio: -21, elapsed: 2.0 minutes.
Prio: -21, elapsed: 4.0 minutes.
Prio: -21, elapsed: 6.0 minutes.
.............
Prio: -21, elapsed: 166.0 minutes.
Prio: -21, elapsed: 168.0 minutes.
Prio: -21, elapsed: 170.0 minutes.
Prio: -21, elapsed: 172.0 minutes.
Prio: -21, elapsed: 174.0 minutes.

For debug kernel:

$ uname -a
Linux fedora 6.5.0-0.rc1.20230713gteb26cbb1.213.rtmutex.fc38.x86_64+debug
#1 SMP PREEMPT_DYNAMIC Fri Jul 14 14:49:04 CST 2023 x86_64 GNU/Linux
$ sudo cat /sys/kernel/debug/sched/preempt
none voluntary (full)
$ gcc -Wall -g ./pi_650_rc1.c
$ sudo ./a.out
pid: 3141
Before: prio: 20
    PID     LWP TTY          TIME CMD
   3141    3141 pts/0    00:00:00 a.out
   3141    3142 pts/0    00:00:00 waiter-0
   3141    3143 pts/0    00:00:00 waiter-1
   3141    3144 pts/0    00:00:00 waiter-2
   3141    3145 pts/0    00:00:00 waiter-3
   3141    3146 pts/0    00:00:00 waiter-4
   3141    3147 pts/0    00:00:00 waiter-5
   3141    3148 pts/0    00:00:00 waiter-6
   3141    3149 pts/0    00:00:00 waiter-7
   3141    3150 pts/0    00:00:00 waiter-8
   3141    3151 pts/0    00:00:00 waiter-9
   3141    3152 pts/0    00:00:00 waiter-10
   3141    3153 pts/0    00:00:00 waiter-11
   3141    3154 pts/0    00:00:00 waiter-12
   3141    3155 pts/0    00:00:00 waiter-13
   3141    3156 pts/0    00:00:00 waiter-14
   3141    3157 pts/0    00:00:00 waiter-15
   3141    3158 pts/0    00:00:00 waiter-16
   3141    3159 pts/0    00:00:00 waiter-17
   3141    3160 pts/0    00:00:00 waiter-18
   3141    3161 pts/0    00:00:00 waiter-19
   3141    3162 pts/0    00:00:00 changer-0
   3141    3163 pts/0    00:00:00 changer-1
   3141    3164 pts/0    00:00:00 changer-2
   3141    3165 pts/0    00:00:00 changer-3
   3141    3166 pts/0    00:00:00 changer-4
   3141    3167 pts/0    00:00:00 changer-5
   3141    3168 pts/0    00:00:00 changer-6
   3141    3169 pts/0    00:00:00 changer-7
   3141    3170 pts/0    00:00:00 changer-8
   3141    3171 pts/0    00:00:00 changer-9
   3141    3172 pts/0    00:00:00 changer-10
   3141    3173 pts/0    00:00:00 changer-11
   3141    3174 pts/0    00:00:00 changer-12
   3141    3175 pts/0    00:00:00 changer-13
   3141    3176 pts/0    00:00:00 changer-14
   3141    3177 pts/0    00:00:00 changer-15
   3141    3178 pts/0    00:00:00 changer-16
   3141    3179 pts/0    00:00:00 changer-17
   3141    3180 pts/0    00:00:00 changer-18
   3141    3181 pts/0    00:00:00 changer-19
Prio: -21, elapsed: 2.0 minutes.
Prio: -21, elapsed: 4.0 minutes.
Prio: -21, elapsed: 6.0 minutes.
Prio: -21, elapsed: 8.0 minutes.
Prio: -21, elapsed: 10.0 minutes.
Prio: -21, elapsed: 12.0 minutes.
Prio: -21, elapsed: 14.0 minutes.
....................
Prio: -21, elapsed: 388.0 minutes.
Prio: -21, elapsed: 390.0 minutes.
Prio: -21, elapsed: 392.0 minutes.
Prio: -21, elapsed: 394.0 minutes.
Prio: -21, elapsed: 396.0 minutes.
^C
$

View attachment "pi_650_rc1.c" of type "text/x-csrc" (5641 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ