linux-kernel - Re: [PATCH 1/2] sched/core: Cookied forceidle accounting per cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29NvPJ3S1xq5xm+52OoUGDyuMSxGOLJbopPa3+-QmLnVYeQ@mail.gmail.com>
Date:   Tue, 4 Jan 2022 17:48:08 -0800
From:   Josh Don <joshdon@...gle.com>
To:     Cruz Zhao <CruzZhao@...ux.alibaba.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Benjamin Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched/core: Cookied forceidle accounting per cpu

Hi Cruz,

Could you add a bit more background to help me understand what case
this patch solves? Is your issue that you want to be able to
attribute forced idle time to the specific cpus it happens on, or do
you simply want a more convenient way of summing forced idle without
iterating your cookie'd tasks and summing the schedstat manually?

> @@ -190,6 +202,9 @@ static int show_stat(struct seq_file *p, void *v)
>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal));
>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest));
>                 seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
> +#ifdef CONFIG_SCHED_CORE
> +               seq_put_decimal_ull(p, " ", nsec_to_clock_t(cookied_forceidle));
> +#endif

IMO it would be better to always print this stat, otherwise it sets a
weird precedent for new stats added in the future (much more difficult
for userspace to reason about which column corresponds with which
field, since it would depend on kernel config).

Also, did you intend to put this in /proc/stat instead of
/proc/schedstat (the latter of which would be more attractive to
prevent calculation of these stats unless schestat was enabled)?

> diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
> @@ -260,6 +261,21 @@ void __sched_core_account_forceidle(struct rq *rq)
>
>         rq->core->core_forceidle_start = now;
>
> +       for_each_cpu(i, smt_mask) {
> +               rq_i = cpu_rq(i);
> +               p = rq_i->core_pick ?: rq_i->curr;
> +
> +               if (!rq->core->core_cookie)
> +                       continue;

I see this is temporary given your other patch, but just a note that
if your other patch is dropped, this check can be pulled outside the
loop.

> +               if (p == rq_i->idle && rq_i->nr_running) {
> +                       cpustat = kcpustat_cpu(i).cpustat;
> +                       cpustat[CPUTIME_COOKIED_FORCEIDLE] += delta;
> +               }
> +       }

I don't think this is right. If a cpu was simply idle while some other
SMT sibling on its core was forced idle, and then a task happens to
wake on the idle cpu, that cpu will now be charged the full delta here
as forced idle (when actually it was never forced idle, we just
haven't been through pick_next_task yet). One workaround would be to
add a boolean to struct rq to cache whether the rq was in forced idle
state.

Best,
Josh