linux-kernel - Re: [PATCH 4/8] sched/core: Skip wakeup when task is already running.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 5 May 2014 15:32:20 +0900
From:	Dongsheng Yang <yangds.fnst@...fujitsu.com>
To:	Peter Zijlstra <peterz@...radead.org>, <bsegall@...gle.com>,
	<rostedt@...dmis.org>
CC:	<fweisbec@...il.com>, <mingo@...hat.com>,
	<acme@...stprotocols.net>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/8] sched/core: Skip wakeup when task is already running.

Hi all，
Thanx for your reply, and sorry for the late.

On 04/23/2014 03:18 AM, Peter Zijlstra wrote:
> On Tue, Apr 22, 2014 at 10:10:52AM -0700, bsegall@...gle.com wrote:
>> This is all expected behavior, and the somewhat less than useful trace
>> events are expected. A task setting p->state to TASK_RUNNING without
>> locks is fine if and only p == current. The standard deschedule loop is
>> basically:
>>
>> while (1) {
>>    set_current_state(TASK_(UN)INTERRUPTIBLE);
>>    if (should_still_sleep)
>>      schedule();
>> }
>> set_current_state(TASK_RUNNING);
>>
>> Which can produce this in a race.
>>
>> The only problem this causes is a wasted check_preempt_curr call in the
>> racing case, and a somewhat inaccurate sched:sched_wakeup trace event.
>> Note that even if you did recheck in ttwu_do_wakeup you could still race
>> and get an "inaccurate" trace event.

Yes, even I recheck it in ttwu_do_wakeup(), I could still race.

Now I can not catch up a good idea to avoid this race.
But I think it is not too expensive.

About the event of sched:sched_wakeup we can get the bug information
as I mentioned at the first mail of this thread:

[root@...-PC linux]# perf sched latency|tail
bash:25186 | 0.410 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25174 | 0.407 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25268 | 0.367 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25279 | 0.358 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
bash:25238 | 0.286 ms | 1 | avg: 0.000 ms | max: 0.000 ms | max at: 
0.000000 s
-----------------------------------------------------------------------------------------
TOTAL: | 1344.539 ms | 9604 |
---------------------------------------------------
*INFO: 0.557% state machine bugs (51 out of 9161)*

It is caused by the issue we discussed in this thread, that ttwu_do_wakeup()
could be called to wakeup a task which is on run queue.

There are two solutions in my mind:
* Add a new trace event, such as sched:sched_enqueue. Then we can trace
the event of it and get the timestamp when a task enqueue and start waiting
cpu. In this way, we can calculate the latency time with
(sched_in_time - sched_enqueue_time) in `perf sched latency`.

* Move the current trace point from ttwu_do_wakeup() to
ttwu_activate(). Currently the sched:sched_wakeup can tell user very little.
When we get a sched:sched_wakeup:
a) We can not say a task is inserted into run queue, it is also used for 
task
which is on_rq and only change the task->state to TASK_RUNNING.
b) We can not say the task->state is changed from {UN}INTERRUPTABLE to
RUNNING, sometimes task->state is already changed to RUNNING by other cpu.

I prefer the second one, anyway, current sched_wakeup tells user none.

>> Heck, even if the ttwu is
>> _necessary_ because p is currently trying to take rq->lock to
>> deschedule, you won't get a matching sched_switch event, because the
>> ttwu is running before schedule is.
>>
>> You could sorta fix this I guess by tracking every write to p->state
>> with trace events, but that would be a somewhat different change, and
>> might be considered too expensive for all I know (and the trace events
>> could /still/ be resolved in a different order across cpus compared to
>> p->state's memory).
> Ah, you're saying that a second task could try a spurious wakeup between
> set_current_state() and schedule(). Yes, that'll trigger this indeed.
>
>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/