lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b02204ea-0683-2879-5843-4cfb31d44d27@linux.alibaba.com>
Date:   Wed, 5 Jan 2022 19:33:18 +0800
From:   cruzzhao <cruzzhao@...ux.alibaba.com>
To:     Josh Don <joshdon@...gle.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Benjamin Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched/core: Cookied forceidle accounting per cpu



在 2022/1/5 上午9:48, Josh Don 写道:
> Hi Cruz,
> 
Firstly, attributing forced idle time to the specific cpus it happens on
can help us measure the effect of steal_cookie_task(). We found out that
steal_cookie_task() conflicts with load balance sometimes, for example,
a cookie'd task is stolen by steal_cookie_task(), but it'll be migrated
to another core by load balance soon. Secondly, a more convenient way of
summing forced idle instead of iterating cookie'd task is indeed what we
need. In the multi-rent scenario, it'll be complex to maintain the list
of cookie'd task and it'll cost a lot to iterate it.

> Could you add a bit more background to help me understand what case
> this patch solves? Is your issue that you want to be able to
> attribute forced idle time to the specific cpus it happens on, or do
> you simply want a more convenient way of summing forced idle without
> iterating your cookie'd tasks and summing the schedstat manually?
> 
>> @@ -190,6 +202,9 @@ static int show_stat(struct seq_file *p, void *v)
>>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal));
>>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest));
>>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
>> +#ifdef CONFIG_SCHED_CORE
>> +               seq_put_decimal_ull(p, " ", nsec_to_clock_t(cookied_forceidle));
>> +#endif
> 
I'll put this in /proc/schedstat and fix the problem that accounting
simply idle as force idle in the next version.

Many thanks for suggestions.

Best,
Cruz Zhao
> IMO it would be better to always print this stat, otherwise it sets a
> weird precedent for new stats added in the future (much more difficult
> for userspace to reason about which column corresponds with which
> field, since it would depend on kernel config).
> 
> Also, did you intend to put this in /proc/stat instead of
> /proc/schedstat (the latter of which would be more attractive to
> prevent calculation of these stats unless schestat was enabled)?
> 
>> diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
>> @@ -260,6 +261,21 @@ void __sched_core_account_forceidle(struct rq *rq)
>>
>>         rq->core->core_forceidle_start = now;
>>
>> +       for_each_cpu(i, smt_mask) {
>> +               rq_i = cpu_rq(i);
>> +               p = rq_i->core_pick ?: rq_i->curr;
>> +
>> +               if (!rq->core->core_cookie)
>> +                       continue;
> 
> I see this is temporary given your other patch, but just a note that
> if your other patch is dropped, this check can be pulled outside the
> loop.
> 
>> +               if (p == rq_i->idle && rq_i->nr_running) {
>> +                       cpustat = kcpustat_cpu(i).cpustat;
>> +                       cpustat[CPUTIME_COOKIED_FORCEIDLE] += delta;
>> +               }
>> +       }
> 
> I don't think this is right. If a cpu was simply idle while some other
> SMT sibling on its core was forced idle, and then a task happens to
> wake on the idle cpu, that cpu will now be charged the full delta here
> as forced idle (when actually it was never forced idle, we just
> haven't been through pick_next_task yet). One workaround would be to
> add a boolean to struct rq to cache whether the rq was in forced idle
> state.
> 
> Best,
> Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ