lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h_uoja87NUZb-tWmG_6Fb+1bATB03VRO0Foi5nFadiVQ@mail.gmail.com>
Date:   Mon, 5 Dec 2022 13:39:52 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Qais Yousef <qyousef@...alina.io>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Lukasz Luba <lukasz.luba@....com>, Wei Wang <wvw@...gle.com>,
        Xuewen Yan <xuewen.yan94@...il.com>,
        Hank <han.lin@...iatek.com>,
        Jonathan JMChen <Jonathan.JMChen@...iatek.com>
Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect
 capacity inversion

On Sat, Dec 3, 2022 at 3:32 PM Qais Yousef <qyousef@...alina.io> wrote:
>
> On 11/30/22 19:27, Rafael J. Wysocki wrote:
>
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 7c0dd57e562a..4bbbca85134b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8856,23 +8856,20 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
> > >          *   * Thermal pressure will impact all cpus in this perf domain
> > >          *     equally.
> > >          */
> > > -       if (sched_energy_enabled()) {
> > > +       if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> > >                 unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
> > > -               struct perf_domain *pd = rcu_dereference(rq->rd->pd);
> > > +               struct cpufreq_policy *policy, __maybe_unused *policy_n;
> > >
> > >                 rq->cpu_capacity_inverted = 0;
> > >
> > > -               SCHED_WARN_ON(!rcu_read_lock_held());
> > > -
> > > -               for (; pd; pd = pd->next) {
> > > -                       struct cpumask *pd_span = perf_domain_span(pd);
> > > +               for_each_active_policy_safe(policy, policy_n) {
> >
> > 1. Is the "safe" part sufficient for protection against concurrent
> > deletion and freeing of list entries?  cpufreq driver removal can do
> > that AFAICS.
>
> The freeing part is not safe probably.

Well, I don't even think that the traversal part is safe against
concurrent removal of list entries (it is safe against removal of list
entries in the loop itself).

list_for_each_entry_safe() assumes that n will always point to a valid
list entry, but what if the entry pointed to by it is deleted by a
concurrent thread?

> I need to research this more. Do you
> have issues against the exportation of this traversal in principle?
>
> Switching them to be RCU protected could be the best safe option, anything
> against that too?

Not really, it just occurred to me that you may end up dealing with a
corrupted list here.

> I might not end up needing that. I need to dig more.
>
> > 2. For a casual reader of this code it may not be clear why cpufreq
> > policies matter here.
>
> I'm looking for a way to traverse the list of capacities of the system and
> know their related CPUs.

So why don't you mention this in a comment, for instance?

> AFAICT this information already exists in the performance domains and
> cpufreq_policy. Performance domains are conditional to energy model and
> schedutil. So trying to switch to cpufreq_policy.
>
> Assuming this question wasn't a request to add a comment :-)

Yes, it was. :-)

That said though, this change makes the scheduler kind of depend on
cpufreq which feels a bit like a corner cut TBH.

I do realize that cpufreq happens to be maintaining a data structure
that turns out to be useful here, but the reason why it is there has
nothing to do with this code AFAICS.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ