linux-kernel - Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <570C6694.50803@redhat.com>
Date:	Tue, 12 Apr 2016 11:08:04 +0800
From:	Xunlei Pang <xpang@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@....com>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

On 2016／04／10 at 16:22, Xunlei Pang wrote:
> On 2016/04/09 at 21:29, Peter Zijlstra wrote:
>> On Sat, Apr 09, 2016 at 11:25:39AM +0800, Xunlei Pang wrote:
>>
>>>> In any case, I just realized we do not in fact provide this guarantee
>>>> (of pointing to a blocked task) that needs a bit more work.
>>> Current patch calls rt_mutex_adjust_prio() before wake_up_q() the
>>> wakee, at that moment the wakee has been removed by
>>> rt_mutex_slowunlock()->mark_wakeup_next_waiter(), from current's
>>> pi_waiters, rt_mutex_adjust_prio() won't see this wakee, so I think
>>> this should not be problem.
>> No, any wakeup after mark_wakeup_next_waiter() will make the task run.
>> And since we must always consider spurious wakeups, we cannot rely on us
>> (eg. our wake_up_q call) being the one and only.
>>
>> Therefore it is possible and the only thing that stands between us and
>> doom is the fact that the wake_q stuff holds a task reference.
>>
>> But we cannot guarantee that the task we have a pointer to is in fact
>> blocked.
> Does that really matters? the pointer is accessed on behalf of current,  and current
> will call rt_mutex_adjust_prio() very soon to update the right pointer.
>
> Also the pointer is used to update current's deadline/runtime, we can restore these
> params in rt_mutex_setprio() for deboost cases. I just checked current code,  it did
> nothing to restore current's deadline/runtime when deboosting, maybe we can leave
> this job to future deadline bandwidth inheritance?
>
> Regards,
> Xunlei
>

I spotted another issue, we access pi_task without any lock in enqueue_task_dl(),
though we have "dl_prio(pi_task->normal_prio)" condition, that's not enough, "dl_period"
and "dl_runtime" of pi_task can change, if it changed to !deadline class, dl_runtime
was cleared to 0, we will hit a forever loop in replenish_dl_entity() below:
    while (dl_se->runtime <= 0) {
        dl_se->deadline += pi_se->dl_period;
        dl_se->runtime += pi_se->dl_runtime;
    }

or hit "BUG_ON(pi_se->dl_runtime <= 0);".

That's all because without any lock of that task, there is no guarantee.

So I'm thinking of adding more members in rt_mutex_waiter(we don't lose memory,
it's defined on stack) and use this structure instead of task_struct as the top waiter
(i.e. using get_pi_top_waiter() instead of get_pi_top_task()), like:

Step1:
struct rt_mutex_waiter {
         int prio;
+       /* updated under waiter's pi_lock and rt_mutex lock */
+       u64 dl_runtime, dl_period;
+       /*
+        * under owner's pi_lock, rq lock, and rt_mutex lock, copied
+        * directly from dl_runtime, dl_period(under same rt_mutex lock).
+        */
+       u64 dl_runtime_copy, dl_period_copy;

Similarly, adding the memember in task_struct:
#ifdef CONFIG_RT_MUTEXES
    /* PI waiters blocked on a rt_mutex held by this task */
    struct rb_root pi_waiters;
    struct rb_node *pi_waiters_leftmost;
+  struct rb_node *pi_waiters_leftmost_copy; /* updated unlock pi_lock and rq lock */
    /* Deadlock detection and priority inheritance handling */
    struct rt_mutex_waiter *pi_blocked_on;
#endif

Then, we can update "pi_waiters_leftmost_copy" together with "dl_runtime_copy"
and "dl_period_copy" under rq lock, then enqueue_task_dl() can access them
without any problem.

Step2:
We must update "dl_runtime_copy" and "dl_period_copy" under rt_mutex lock,
because it is copied from "dl_runtime" and "dl_period" of rt_mutex_waiter, so we
add a new update function as long as we held rq lock and rt_mutex lock, mainly
can be implemented in rt_mutex_setprio.

Step3:
Note that rt_mutex_setprio() can be called without rtmutex lock by rt_mutex_adjust_prio(),
we can add a parameter to indicate not doing the copy-updating work at that place,
the same applies to rt_mutex_setprio(add a new waiter parameter and keep the original
"prio" parameter). Then we can do the copy-updating work in mark_wakeup_next_waiter()
before unlocking current's pi_lock, as long as we hold rq lock, because rtmutex lock and
owner's pi_lock was already held.

This can also solve the issue you mentioned with only a little overhead involved.

Regards,
Xunlei