[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB6083B06D21C6C348A3FAE6AEFC1AA@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Tue, 30 Sep 2025 16:08:50 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Chen, Yu C" <yu.c.chen@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Chatre,
Reinette" <reinette.chatre@...el.com>, James Morse <james.morse@....com>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
"x86@...nel.org" <x86@...nel.org>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, Dave Martin <Dave.Martin@....com>
Subject: RE: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
per-arch
> >>> This seems to be applicable as it introduces the new interface
> >>> while preserving forward compatibility.
> >>>
> >>> One minor question is that, according to "Figure 6-5. MBA Optimal
> >>> Bandwidth Register" in the latest RDT specification, the maximum
> >>> value ranges from 1 to 511.
> >>> Additionally, this bandwidth field is located at bits 48 to 56 in
> >>> the MBA Optimal Bandwidth Register, and the range for
> >>> this segment could be 1 to 8191. Just wonder if it would be
> >
> > 48..56 is still 9 bits, so max value is 511.
> >
>
> Ah I see, I overlooked this.
>
> >>> possible that the current maximum value of 512 may be extended
> >>> in the future? Perhaps we could explore a method to query the maximum upper
> >>> limit from the ACPI table or register, or use CPUID to distinguish between
> >>> platforms rather than hardcoding it. Reinette also mentioned this in another
> >>> thread.
> >
> > I think 511 was chosen as "bigger than we expect to ever need" and 9-bits
> > allocated in the registers based on that.
> >
>
> OK, got it.
>
> > Initial implementation may use 255 as the maximum - though I'm pushing on
> > that a bit as the throttle graph at the early stage is fairly linear from "1" to some
> > value < 255,
> > when bandwidth hits maximum, then flat up to 255.
> > If things stay that way, I'm arguing that the "Q" value enumerated in the ACPI
> > table should be the value where peak bandwidth is hit
>
> I see. If I understand correctly, the BIOS needs to pre-train the system to
> find this Q. However, if the BIOS cannot provide this Q, would it be
> feasible
> for the user to provide it? For example, the user could saturate the memory
> bandwidth, gradually increase MB_MAX, and finally find the Q_max where the
> memory bandwidth no longer increases. The user could then adjust the max
> field in the info file.
>
> > (though this is complicated
> > because workloads with different mixes of read/write access have different
> > throttle graphs).
> >
>
> Does this mean read and write operations have different Q values to saturate
> the memory bandwidth? For example, if the workload is all reads, there
> is a Q_r;
> if the workload is all writes, there is another Q_w. In that case, maybe we
> could choose the maximum of Q_r and Q_w (max(Q_r, Q_w)).
If the BIOS doesn't provide a good enough number, then users might well
do some tuning based on the workloads they plan to run and ignore the value
in the info file in favor of one tuned specifically for their workloads. But it is too
early to start guessing at workarounds for problems that may not even exist.
-Tony
Powered by blists - more mailing lists