linux-kernel - Re: sched: race between deactivate and switch sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ed628a920910121420k5f01892dsc9cfe0b56eb2c29e@mail.gmail.com>
Date:	Mon, 12 Oct 2009 14:20:53 -0700
From:	Paul Turner <pjt@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	mingo@...e.hu, linux-kernel@...r.kernel.org
Subject: Re: sched: race between deactivate and switch sched_info accounting?

On Fri, Oct 9, 2009 at 11:37 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, 2009-10-09 at 19:40 -0700, Paul Turner wrote:
>
> This looks very funny, I would expect that whoever does activate() on
> that task to do the sched_info*() muck?

They do their part, the problem stems from there being 2 halves of
sched_info muck to deal with:

then enqeue/dequeue tracking (which is called out of the activate path)

then the arrive/depart (which is hinged off the task switch in schedule())

These steps share state in last_queued; we are accounting the
enqeue/dequeue properly but because prev==next we skip the second half
of the accounting and end up out of sync.

>
> The below patch looks very asymmetric in that regard.
>

the idea below is to fix things up by 'inserting' the task->idle->task
switch we'd have had if we didn't race in wake-up

this approach might look more natural if the first info_switch was
just pulled above the actual switch logic with this check; another
option might be to force a last_queued reset on depart (since it's
really only the run_delay skew that's significant)

>> It's possible for our previously de-activated task to be re-activated by a
>> remote cpu during lock balancing.  We have to account for this manually
>> since prev == next, yet the task just went through dequeue accounting.
>>
>>  Signed-off-by: Paul Turner <pjt@...gle.com>
>> ---
>>  kernel/sched.c |   15 ++++++++++++---
>>  1 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index ee61f45..6445d9d 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -5381,7 +5381,7 @@ asmlinkage void __sched schedule(void)
>>       struct task_struct *prev, *next;
>>       unsigned long *switch_count;
>>       struct rq *rq;
>> -     int cpu;
>> +     int cpu, deactivated_prev = 0;
>>
>>  need_resched:
>>       preempt_disable();
>> @@ -5406,8 +5406,10 @@ need_resched_nonpreemptible:
>>       if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
>>               if (unlikely(signal_pending_state(prev->state, prev)))
>>                       prev->state = TASK_RUNNING;
>> -             else
>> +             else {
>>                       deactivate_task(rq, prev, 1);
>> +                     deactivated_prev = 1;
>> +             }
>>               switch_count = &prev->nvcsw;
>>       }
>>
>> @@ -5434,8 +5436,15 @@ need_resched_nonpreemptible:
>>                */
>>               cpu = smp_processor_id();
>>               rq = cpu_rq(cpu);
>> -     } else
>> +     } else {
>> +             /*
>> +              * account for our previous task being re-activated by a
>> +              * remote cpu.
>> +              */
>> +             if (unlikely(deactivated_prev))
>> +                     sched_info_switch(prev, prev);
>>               spin_unlock_irq(&rq->lock);
>> +     }
>>
>>       post_schedule(rq);
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/