linux-kernel - Re: [PATCH v1 0/3] arch_topology: Correct CPU capacity scaling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220315145935.GA168726@leoy-ThinkPad-X240s>
Date:   Tue, 15 Mar 2022 22:59:35 +0800
From:   Leo Yan <leo.yan@...aro.org>
To:     Sudeep Holla <sudeep.holla@....com>
Cc:     Ionela Voinescu <ionela.voinescu@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 0/3] arch_topology: Correct CPU capacity scaling

Hi Sudeep,

On Tue, Mar 15, 2022 at 10:08:28AM +0000, Sudeep Holla wrote:

[...]

> > > In my opinion it's difficult to handle absent "capacity-dmips-mhz"
> > > properties, as they can be a result of 3 scenarios: potential..
> > >  1. bug in DT
> > >  2. unwillingness to fill this information in DT
> > >  3. suggestion that we're dealing with CPUs with same u-arch
> > >     (same capacity-dmips-mhz)
> > 
> > For absent "capacity-dmips-mhz" properties, I think we could divide into
> > two sub classes:
> > 
> > For all CPU nodes are absent "capacity-dmips-mhz" properties, it's
> > likely all CPUs have the same micro architecture, thus developers are
> > not necessarily to explictly set the property.
> >
> 
> I completely disagree and NACK to deal with absence of the property in DT.
> The binding clearly states:
> 
> "CPU capacity is a number that provides the scheduler information about CPUs
> heterogeneity. Such heterogeneity can come from micro-architectural differences
> (e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
> (e.g., SMP systems with multiple frequency domains). Heterogeneity in this
> context is about differing performance characteristics; this binding tries to
> capture a first-order approximation of the relative performance of CPUs."
> 
> So it is clear that using same uarch can't be an excuse to miss this property.
> So if you need the scheduler to be aware of this heterogeneity, better update
> the DT with property. Absence will always means scheduler need not be aware
> of this heterogeneity.

Okay, understood your point and I am respect that.

> > For partial CPUs absent "capacity-dmips-mhz" properties, this is an
> > usage issue in DT and kernel should handle this as an error and report
> > it.
> >
> 
> That makes sense. As I mentioned in my earlier email, we can always flag
> up error in the kernel, but it would be good to catch these much earlier
> in DT via schema if possible.
> 
> > > I'm not sure it's up to us to interpret suggestions in the code so I
> > > believe treating missing information as error is the right choice, which
> > > is how we're handling this now.
> > 
> > Yes, current kernel means to treat missing info as error, whatever if
> > all CPUs or partial CPUs are absent "capacity-dmips-mhz" properties.
> >
> 
> OK, so no change needed ? I am confused as what is missing today.

The different understanding between us is for the case when all CPUs
absent "capacity-dmips-mhz" properties, seems to me we can take it as
the same thing as all CPUs with binding "capacity-dmips-mhz" = 1024.

Maybe I am is bit obsessive on this :)

> > > For 3. (and patch 03), isn't it easier to populate capacity-dmips-mhz to
> > > the same value (say 1024) in DT? That is a clear message that we're
> > > dealing with CPUs with the same u-arch.
> >
> > "capacity-dmips-mhz" is defined as a _optional_ property in the DT
> > document (see devicetree/bindings/arm/cpu-capacity.txt).
> 
> That means that the kernel can operate without the info and nothing more
> than that. We are not providing guarantee that the same performance is
> possible with or without this optional property.
> 
> > Current kernel rolls back every CPU raw capacity to 1024 if DT doesn't
> > bind "capacity-dmips-mhz" properties, given many SoCs with same CPU
> > u-arch this is right thing to do; here I think kernel should proceed to
> > scale CPU capacity with its maximum frequency.
> 
> As stated above, I completely disagree and once again NACK.
> 
> > When I worked on a platform with a fast and a slow clusters (two clusters
> > have different max frequencies and with the same CPU u-arch), it's a bit
> > puzzle when I saw all CPU's capacities are always 1024.  In this case,
> > since a platform have no CPU capacity modeling, and "capacity-dmips-mhz"
> > property is not needed to populate in DT, but at the end the kernel
> > should can reflect the scaled CPU capacity correctly.
> >
> 
> Fix the broken DT with respect to this feature. I mean DT is not broken, but
> if once needs this feature then they should teach the kernel the hardware
> difference with this property.
> 
> Another possible issue I can see if this is dealt within the kernel is if
> on some platform for thermal or any valid hardware errata reasons, one set
> of CPUs can run at max one frequency while the other is restricted at a
> suitable lower frequency, it may not be good idea to mark that as difference
> in cpu capacity as they are SMP CPUs just in different perf domains with
> different limits. I assume the scale invariance must deal with that.
> I may be wrong here but that's my understanding, happy to be corrected.

After looked a bit for the code, the short answer is we don't need to
adjust "capacity-dmips-mhz" for any thermal capping or CPU frequency
limit.

Since "capacity-dmips-mhz"'s unit is DMIPS/MHz, it's a modeling value
(e.g. generated by using Dhrystone, sysbench, etc).  This is why for
the same micro architecture CPUs, we don't need to do any profiling
and would be fine to directly set as 1024 for all CPUs (no matter the
maximum frequency).

In the kernel, there have two scale invariants: one is CPU capacity
invariant, my understanding is it can allow us to compare capacity
across CPUs; another is CPU frequency invariant, it's used to scale
capacity for different OPPs on a CPU.

So "capacity-dmips-mhz" is used to calculate CPU capacity invariant,
the formual is:

  cpu_scale(cpu) = capacity-dmips-mhz(cpu) * policy(cpu)->cpuinfo.max_freq

policy(cpu)->cpuinfo.max_freq is the maximum frequency when register OPP
table, it's no matter with thermal capping or CPU frequency limit.

Thanks,
Leo