[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9e69ebafe7e767eeb1fba250af9c9f7629bae5e.camel@intel.com>
Date: Fri, 1 Dec 2023 17:41:19 +0000
From: "Zhang, Rui" <rui.zhang@...el.com>
To: "linux@...ck-us.net" <linux@...ck-us.net>,
"jdelvare@...e.com" <jdelvare@...e.com>
CC: "Yu, Fenghua" <fenghua.yu@...el.com>,
"linux-hwmon@...r.kernel.org" <linux-hwmon@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Fix core count limitation
On Thu, 2023-11-30 at 19:06 -0800, Guenter Roeck wrote:
> On 11/27/23 05:16, Zhang Rui wrote:
> > Currently, coretemp driver only supports 128 cores per package.
> > This loses some core temperation information on systems that have
> > more
> > than 128 cores per package.
> > [ 58.685033] coretemp coretemp.0: Adding Core 128 failed
> > [ 58.692009] coretemp coretemp.0: Adding Core 129 failed
> >
> > Fix the problem by using a per package list to maintain the per
> > core
> > temp_data instead of the fixed length pdata->core_data[] array.
> >
> > Signed-off-by: Zhang Rui <rui.zhang@...el.com>
> > ---
> > drivers/hwmon/coretemp.c | 110 ++++++++++++++++++----------------
> > -----
> > 1 file changed, 52 insertions(+), 58 deletions(-)
> >
> > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > index cef43fedbd58..1bb1a6e4b07b 100644
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -39,11 +39,7 @@ static int force_tjmax;
> > module_param_named(tjmax, force_tjmax, int, 0444);
> > MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
> >
> > -#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for
> > package temp */
> > -#define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for
> > coretemp */
> > -#define NUM_REAL_CORES 128 /* Number of Real cores per
> > cpu */
> > #define CORETEMP_NAME_LENGTH 28 /* String Length of attrs
> > */
> > -#define MAX_CORE_DATA (NUM_REAL_CORES +
> > BASE_SYSFS_ATTR_NO)
> >
> > enum coretemp_attr_index {
> > ATTR_LABEL,
> > @@ -90,17 +86,17 @@ struct temp_data {
> > struct attribute *attrs[TOTAL_ATTRS + 1];
> > struct attribute_group attr_group;
> > struct mutex update_lock;
> > + struct list_head node;
> > };
> >
> > /* Platform Data per Physical CPU */
> > struct platform_data {
> > struct device *hwmon_dev;
> > u16 pkg_id;
> > - u16 cpu_map[NUM_REAL_CORES];
> > - struct ida ida;
> > struct cpumask cpumask;
> > - struct temp_data *core_data[MAX_CORE_DATA];
> > struct device_attribute name_attr;
> > + struct mutex core_data_lock;
> > + struct list_head core_data_list;
> > };
> >
> > struct tjmax_pci {
> > @@ -491,6 +487,23 @@ static struct temp_data
> > *init_temp_data(unsigned int cpu, int pkg_flag)
> > return tdata;
> > }
> >
> > +static struct temp_data *get_tdata(struct platform_data *pdata,
> > int cpu)
> > +{
> > + struct temp_data *tdata;
> > +
> > + mutex_lock(&pdata->core_data_lock);
> > + list_for_each_entry(tdata, &pdata->core_data_list, node) {
> > + if (cpu >= 0 && !tdata->is_pkg_data && tdata-
> > >cpu_core_id == topology_core_id(cpu))
> > + goto found;
> > + if (cpu < 0 && tdata->is_pkg_data)
> > + goto found;
> > + }
>
> I really don't like this. With 128+ cores, it gets terribly
> expensive.
I think this is only invoked in the cpu online/offline code, which is
not the hot path.
>
> How about calculating the number of cores in the probe function and
> allocating cpu_map[] and core_data[] instead ?
Problem is that the number of cores is not available due to some
limitations in current CPU topology code.
Thomas has some patch series to fix this but that has not been merged
yet and we don't know when.
A simpler workaround for this issue is to change NUM_REAL_CORES to a
bigger value, and do dynamic memory allocation first. And we can change
the code to use the real number of cores later if that information
becomes available.
thanks,
rui
Powered by blists - more mailing lists