lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f88838ff-2024-ca32-069e-f7a4c0465961@didichuxing.com>
Date:   Fri, 17 Feb 2023 16:35:24 +0800
From:   Honglei Wang <wanghonglei@...ichuxing.com>
To:     Abel Wu <wuyun.abel@...edance.com>, Chen Yu <yu.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>
CC:     Mel Gorman <mgorman@...hsingularity.net>,
        Tim Chen <tim.c.chen@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Tianchen Ding <dtcccc@...ux.alibaba.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Don <joshdon@...gle.com>, Hillf Danton <hdanton@...a.com>,
        <linux-kernel@...r.kernel.org>,
        kernel test robot <yujie.liu@...el.com>
Subject: Re: [PATCH v5 2/2] sched/fair: Introduce SIS_SHORT to wake up short
 task on current CPU



On 2023/2/16 20:55, Abel Wu wrote:
> Hi Chen,
> 
> I've tested this patchset (with modification) on our Redis proxy
> servers, and the results seems promising.
> 
> On 2/3/23 1:18 PM, Chen Yu wrote:
>> ...
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index aa16611c7263..d50097e5fcc1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6489,6 +6489,20 @@ static int wake_wide(struct task_struct *p)
>>       return 1;
>>   }
>> +/*
>> + * If a task switches in and then voluntarily relinquishes the
>> + * CPU quickly, it is regarded as a short duration task.
>> + *
>> + * SIS_SHORT tries to wake up the short wakee on current CPU. This
>> + * aims to avoid race condition among CPUs due to frequent context
>> + * switch.
>> + */
>> +static inline int is_short_task(struct task_struct *p)
>> +{
>> +    return sched_feat(SIS_SHORT) && p->se.dur_avg &&
>> +           ((p->se.dur_avg * 8) < sysctl_sched_min_granularity);
>> +}
> 
> I changed the factor to fit into the shape of tasks in question.
> 
>      static inline int is_short_task(struct task_struct *p)
>      {
>          u64 dur = sysctl_sched_min_granularity / 8;
> 
>          if (!sched_feat(SIS_SHORT) || !p->se.dur_avg)
>              return false;
> 
>          /*
>           * Bare tracepoint to allow dynamically changing
>           * the threshold.
>           */
>          trace_sched_short_task_tp(p, &dur);
> 
>          return p->se.dur_avg < dur;
>      }
> 
> I'm not sure it is the right way to provide such flexibility, but
> definition of 'short' can be workload specific.
> 
>> +
>>   /*
>>    * The purpose of wake_affine() is to quickly determine on which CPU 
>> we can run
>>    * soonest. For the purpose of speed we only consider the waking and 
>> previous
>> @@ -6525,6 +6539,11 @@ wake_affine_idle(int this_cpu, int prev_cpu, 
>> int sync)
>>       if (available_idle_cpu(prev_cpu))
>>           return prev_cpu;
>> +    /* The only running task is a short duration one. */
>> +    if (cpu_rq(this_cpu)->nr_running == 1 &&
>> +        is_short_task(rcu_dereference(cpu_curr(this_cpu))))
>> +        return this_cpu;
> 
> Since proxy server handles simple data delivery, the tasks are
> generally short running ones and hate task stacking which may
> introduce scheduling latency (even there are only 2 short tasks
> competing each other). So this part brings slight regression on
> the proxy case. But I still think this is good for most cases.
> 
> Speaking of task stacking, I found wake_affine_weight() can be
> much more dangerous. It chooses the less loaded one between the
> prev & this cpu as a candidate, so 'small' tasks can be easily
> stacked on this cpu when wake up several tasks at one time if
> this cpu is unloaded. This really hurts if the 'small' tasks are
> latency-sensitive, although wake_affine_weight() does the right
> thing from the point of view of 'load'.
> 
> The following change greatly reduced the p99lat of Redis service
> from 150ms to 0.9ms, at exactly the same throughput (QPS).
> 
> @@ -5763,6 +5787,9 @@ wake_affine_weight(struct sched_domain *sd, struct 
> task_struct *p,
>      s64 this_eff_load, prev_eff_load;
>      unsigned long task_load;
> 
> +    if (is_short_task(p))
> +        return nr_cpumask_bits;
> +
>      this_eff_load = cpu_load(cpu_rq(this_cpu));
> 
>      if (sync) {
> 
> I know that 'short' tasks are not necessarily 'small' tasks, e.g.
> sleeping duration is small or have large weights, but this works
> really well for this case. This is partly because delivering data
> is memory bandwidth intensive hence prefer cache hot cpus. And I
> think this is also applicable to the general purposes: do NOT let
> the short running tasks suffering from cache misses caused by
> migration.
> 

Redis is a bit special. It runs quick and really sensitive on schedule 
latency. The purpose of this 'short task' feature from Yu is to mitigate 
the migration and tend to place the waking task on local cpu, this is 
somehow on the opposite side of workload such as Redis. The changes you 
did remind me of the latency-prio stuff. Maybe we can do something base 
on both the 'short task' and 'latency-prio' to make your changes more 
general. thoughts?

Thanks,
Honglei

> Best regards,
>      Abel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ