linux-kernel - Re: [PATCH] sched: next buddy hint on sleep and preempt path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTim66NfZJSMUwhWOG80A==wEnvVB4ZS4TQT2TpK8@mail.gmail.com>
Date:	Wed, 2 Mar 2011 11:22:04 -0800
From:	Venkatesh Pallipadi <venki@...gle.com>
To:	Paul Turner <pjt@...gle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>,
	Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH] sched: next buddy hint on sleep and preempt path

On Tue, Mar 1, 2011 at 9:43 PM, Paul Turner <pjt@...gle.com> wrote:
> On Tue, Mar 1, 2011 at 3:33 PM, Venkatesh Pallipadi <venki@...gle.com> wrote:
>> When a task in a taskgroup sleeps, pick_next_task starts all the way back at
>> the root and picks the task/taskgroup with the min vruntime across all
>> runnable tasks. But, when there are many frequently sleeping tasks
>> across different taskgroups, it makes better sense to stay with same taskgroup
>> for its slice period (or until all tasks in the taskgroup sleeps) instead of
>> switching cross taskgroup on each sleep after a short runtime.
>> This helps specifically where taskgroups corresponds to a process with
>> multiple threads. The change reduces the number of CR3 switches in this case.

<snip>

>> ---
>>  kernel/sched_fair.c |   20 ++++++++++++++++++--
>>  1 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index 3a88dee..36e8f02 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -1339,6 +1339,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>        hrtick_update(rq);
>>  }
>>
>> +static void set_next_buddy(struct sched_entity *se);
>> +
>>  /*
>>  * The dequeue_task method is called before nr_running is
>>  * decreased. We remove the task from the rbtree and
>> @@ -1348,14 +1350,22 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>  {
>>        struct cfs_rq *cfs_rq;
>>        struct sched_entity *se = &p->se;
>> +       int task_flags = flags;
>
> simpler: int voluntary = flags & DEQUEUE_SLEEP;

Agree. This looks cleaner. Will change.

>>
>>        for_each_sched_entity(se) {
>>                cfs_rq = cfs_rq_of(se);
>>                dequeue_entity(cfs_rq, se, flags);
>>
>>                /* Don't dequeue parent if it has other entities besides us */
>> -               if (cfs_rq->load.weight)
>> +               if (cfs_rq->load.weight) {
>> +                       /*
>> +                        * Bias pick_next to pick a task from this cfs_rq, as
>> +                        * p is sleeping when it is within its sched_slice.
>> +                        */
>> +                       if (task_flags & DEQUEUE_SLEEP && se->parent)
>> +                               set_next_buddy(se->parent);
>
> re-using the last_buddy would seem like a more natural fit here; also
> doesn't have a clobber race with a wakeup

Yes. Using of next_buddy will be racy. There will be races with
yield_to and preempt as well. But, as long as we use it only as hint,
I thought occasional clobber would be OK.

>
>>                        break;
>> +               }
>>                flags |= DEQUEUE_SLEEP;
>>        }
>>
>> @@ -1887,8 +1897,14 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
>>        update_curr(cfs_rq);
>>        find_matching_se(&se, &pse);
>>        BUG_ON(!pse);
>> -       if (wakeup_preempt_entity(se, pse) == 1)
>> +       if (wakeup_preempt_entity(se, pse) == 1) {
>> +               /*
>> +                * Bias pick_next to pick the sched entity that is
>> +                * triggering this preemption.
>> +                */
>> +               set_next_buddy(pse);
>
> this probably wants some sort of unification with the scale-based next
> buddy above
>

Yes. I can skip this if it is already set by scale based next buddy above.

Thanks,
Venki

>>                goto preempt;
>> +       }
>>
>>        return;
>>
>> --
>> 1.7.3.1
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/