linux-kernel - Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <89c8c847-87e7-a849-e05d-d32b787e9305@linux.intel.com>
Date:   Thu, 12 Apr 2018 10:09:51 -0700
From:   Jae Hyun Yoo <jae.hyun.yoo@...ux.intel.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Alan Cox <alan@...ux.intel.com>, Andrew Jeffery <andrew@...id.au>,
        Andrew Lunn <andrew@...n.ch>,
        Andy Shevchenko <andriy.shevchenko@...el.com>,
        Arnd Bergmann <arnd@...db.de>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Fengguang Wu <fengguang.wu@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Haiyue Wang <haiyue.wang@...ux.intel.com>,
        James Feist <james.feist@...ux.intel.com>,
        Jason M Biils <jason.m.bills@...ux.intel.com>,
        Jean Delvare <jdelvare@...e.com>,
        Joel Stanley <joel@....id.au>,
        Julia Cartwright <juliac@....teric.us>,
        Miguel Ojeda <miguel.ojeda.sandonis@...il.com>,
        Milton Miller II <miltonm@...ibm.com>,
        Pavel Machek <pavel@....cz>,
        Randy Dunlap <rdunlap@...radead.org>,
        Stef van Os <stef.van.os@...drive-technologies.com>,
        Sumeet R Pawnikar <sumeet.r.pawnikar@...el.com>,
        Vernon Mauery <vernon.mauery@...ux.intel.com>,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        devicetree@...r.kernel.org, linux-hwmon@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, openbmc@...ts.ozlabs.org
Subject: Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

On 4/11/2018 8:40 PM, Guenter Roeck wrote:
> On 04/11/2018 07:51 PM, Jae Hyun Yoo wrote:
>> On 4/11/2018 5:34 PM, Guenter Roeck wrote:
>>> On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote:
>>>> Hi Guenter,
>>>>
>>>> Thanks a lot for sharing your time. Please see my inline answers.
>>>>
>>>> On 4/10/2018 3:28 PM, Guenter Roeck wrote:
>>>>> On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote:
>>>>>> This commit adds PECI cputemp and dimmtemp hwmon drivers.
>>>>>>
>>>>>> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@...ux.intel.com>
>>>>>> Reviewed-by: Haiyue Wang <haiyue.wang@...ux.intel.com>
>>>>>> Reviewed-by: James Feist <james.feist@...ux.intel.com>
>>>>>> Reviewed-by: Vernon Mauery <vernon.mauery@...ux.intel.com>
>>>>>> Cc: Alan Cox <alan@...ux.intel.com>
>>>>>> Cc: Andrew Jeffery <andrew@...id.au>
>>>>>> Cc: Andrew Lunn <andrew@...n.ch>
>>>>>> Cc: Andy Shevchenko <andriy.shevchenko@...el.com>
>>>>>> Cc: Arnd Bergmann <arnd@...db.de>
>>>>>> Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>
>>>>>> Cc: Fengguang Wu <fengguang.wu@...el.com>
>>>>>> Cc: Greg KH <gregkh@...uxfoundation.org>
>>>>>> Cc: Guenter Roeck <linux@...ck-us.net>
>>>>>> Cc: Jason M Biils <jason.m.bills@...ux.intel.com>
>>>>>> Cc: Jean Delvare <jdelvare@...e.com>
>>>>>> Cc: Joel Stanley <joel@....id.au>
>>>>>> Cc: Julia Cartwright <juliac@....teric.us>
>>>>>> Cc: Miguel Ojeda <miguel.ojeda.sandonis@...il.com>
>>>>>> Cc: Milton Miller II <miltonm@...ibm.com>
>>>>>> Cc: Pavel Machek <pavel@....cz>
>>>>>> Cc: Randy Dunlap <rdunlap@...radead.org>
>>>>>> Cc: Stef van Os <stef.van.os@...drive-technologies.com>
>>>>>> Cc: Sumeet R Pawnikar <sumeet.r.pawnikar@...el.com>
>>>>>> ---
>>>>>>   drivers/hwmon/Kconfig         |  28 ++
>>>>>>   drivers/hwmon/Makefile        |   2 +
>>>>>>   drivers/hwmon/peci-cputemp.c  | 783 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>>>   drivers/hwmon/peci-dimmtemp.c | 432 +++++++++++++++++++++++
>>>>>>   4 files changed, 1245 insertions(+)
>>>>>>   create mode 100644 drivers/hwmon/peci-cputemp.c
>>>>>>   create mode 100644 drivers/hwmon/peci-dimmtemp.c
>>>>>>
>>>>>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>>>>>> index f249a4428458..c52f610f81d0 100644
>>>>>> --- a/drivers/hwmon/Kconfig
>>>>>> +++ b/drivers/hwmon/Kconfig
>>>>>> @@ -1259,6 +1259,34 @@ config SENSORS_NCT7904
>>>>>>         This driver can also be built as a module.  If so, the module
>>>>>>         will be called nct7904.
>>>>>> +config SENSORS_PECI_CPUTEMP
>>>>>> +    tristate "PECI CPU temperature monitoring support"
>>>>>> +    depends on OF
>>>>>> +    depends on PECI
>>>>>> +    help
>>>>>> +      If you say yes here you get support for the generic Intel PECI
>>>>>> +      cputemp driver which provides Digital Thermal Sensor (DTS) 
>>>>>> thermal
>>>>>> +      readings of the CPU package and CPU cores that are 
>>>>>> accessible using
>>>>>> +      the PECI Client Command Suite via the processor PECI client.
>>>>>> +      Check Documentation/hwmon/peci-cputemp for details.
>>>>>> +
>>>>>> +      This driver can also be built as a module.  If so, the module
>>>>>> +      will be called peci-cputemp.
>>>>>> +
>>>>>> +config SENSORS_PECI_DIMMTEMP
>>>>>> +    tristate "PECI DIMM temperature monitoring support"
>>>>>> +    depends on OF
>>>>>> +    depends on PECI
>>>>>> +    help
>>>>>> +      If you say yes here you get support for the generic Intel 
>>>>>> PECI hwmon
>>>>>> +      driver which provides Digital Thermal Sensor (DTS) thermal 
>>>>>> readings of
>>>>>> +      DIMM components that are accessible using the PECI Client 
>>>>>> Command
>>>>>> +      Suite via the processor PECI client.
>>>>>> +      Check Documentation/hwmon/peci-dimmtemp for details.
>>>>>> +
>>>>>> +      This driver can also be built as a module.  If so, the module
>>>>>> +      will be called peci-dimmtemp.
>>>>>> +
>>>>>>   config SENSORS_NSA320
>>>>>>       tristate "ZyXEL NSA320 and compatible fan speed and 
>>>>>> temperature sensors"
>>>>>>       depends on GPIOLIB && OF
>>>>>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>>>>>> index e7d52a36e6c4..48d9598fcd3a 100644
>>>>>> --- a/drivers/hwmon/Makefile
>>>>>> +++ b/drivers/hwmon/Makefile
>>>>>> @@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802)    += nct7802.o
>>>>>>   obj-$(CONFIG_SENSORS_NCT7904)    += nct7904.o
>>>>>>   obj-$(CONFIG_SENSORS_NSA320)    += nsa320-hwmon.o
>>>>>>   obj-$(CONFIG_SENSORS_NTC_THERMISTOR)    += ntc_thermistor.o
>>>>>> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)    += peci-cputemp.o
>>>>>> +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)    += peci-dimmtemp.o
>>>>>>   obj-$(CONFIG_SENSORS_PC87360)    += pc87360.o
>>>>>>   obj-$(CONFIG_SENSORS_PC87427)    += pc87427.o
>>>>>>   obj-$(CONFIG_SENSORS_PCF8591)    += pcf8591.o
>>>>>> diff --git a/drivers/hwmon/peci-cputemp.c 
>>>>>> b/drivers/hwmon/peci-cputemp.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..f0bc92687512
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/hwmon/peci-cputemp.c
>>>>>> @@ -0,0 +1,783 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>> +// Copyright (c) 2018 Intel Corporation
>>>>>> +
>>>>>> +#include <linux/delay.h>
>>>>>> +#include <linux/hwmon.h>
>>>>>> +#include <linux/hwmon-sysfs.h>
>>>>>
>>>>> Is this include needed ?
>>>>>
>>>>
>>>> No it isn't. Will drop the line.
>>>>
>>>>>> +#include <linux/jiffies.h>
>>>>>> +#include <linux/module.h>
>>>>>> +#include <linux/of_device.h>
>>>>>> +#include <linux/peci.h>
>>>>>> +
>>>>>> +#define TEMP_TYPE_PECI        6  /* Sensor type 6: Intel PECI */
>>>>>> +
>>>>>> +#define CORE_MAX_ON_HSX       18 /* Max number of cores on 
>>>>>> Haswell */
>>>>>> +#define CORE_MAX_ON_BDX       24 /* Max number of cores on 
>>>>>> Broadwell */
>>>>>> +#define CORE_MAX_ON_SKX       28 /* Max number of cores on 
>>>>>> Skylake */
>>>>>> +
>>>>>> +#define DEFAULT_CHANNEL_NUMS  5
>>>>>> +#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX
>>>>>> +#define CPUTEMP_CHANNEL_NUMS  (DEFAULT_CHANNEL_NUMS + 
>>>>>> CORETEMP_CHANNEL_NUMS)
>>>>>> +
>>>>>> +#define CLIENT_CPU_ID_MASK    0xf0ff0  /* Mask for Family / Model 
>>>>>> info */
>>>>>> +
>>>>>> +#define UPDATE_INTERVAL_MIN   HZ
>>>>>> +
>>>>>> +enum cpu_gens {
>>>>>> +    CPU_GEN_HSX, /* Haswell Xeon */
>>>>>> +    CPU_GEN_BRX, /* Broadwell Xeon */
>>>>>> +    CPU_GEN_SKX, /* Skylake Xeon */
>>>>>> +    CPU_GEN_MAX
>>>>>> +};
>>>>>> +
>>>>>> +struct cpu_gen_info {
>>>>>> +    u32 type;
>>>>>> +    u32 cpu_id;
>>>>>> +    u32 core_max;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_data {
>>>>>> +    bool valid;
>>>>>> +    s32  value;
>>>>>> +    unsigned long last_updated;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_group {
>>>>>> +    struct temp_data die;
>>>>>> +    struct temp_data dts_margin;
>>>>>> +    struct temp_data tcontrol;
>>>>>> +    struct temp_data tthrottle;
>>>>>> +    struct temp_data tjmax;
>>>>>> +    struct temp_data core[CORETEMP_CHANNEL_NUMS];
>>>>>> +};
>>>>>> +
>>>>>> +struct peci_cputemp {
>>>>>> +    struct peci_client *client;
>>>>>> +    struct device *dev;
>>>>>> +    char name[PECI_NAME_SIZE];
>>>>>> +    struct temp_group temp;
>>>>>> +    u8 addr;
>>>>>> +    uint cpu_no;
>>>>>> +    const struct cpu_gen_info *gen_info;
>>>>>> +    u32 core_mask;
>>>>>> +    u32 temp_config[CPUTEMP_CHANNEL_NUMS + 1];
>>>>>> +    uint config_idx;
>>>>>> +    struct hwmon_channel_info temp_info;
>>>>>> +    const struct hwmon_channel_info *info[2];
>>>>>> +    struct hwmon_chip_info chip;
>>>>>> +};
>>>>>> +
>>>>>> +enum cputemp_channels {
>>>>>> +    channel_die,
>>>>>> +    channel_dts_mrgn,
>>>>>> +    channel_tcontrol,
>>>>>> +    channel_tthrottle,
>>>>>> +    channel_tjmax,
>>>>>> +    channel_core,
>>>>>> +};
>>>>>> +
>>>>>> +static const struct cpu_gen_info cpu_gen_info_table[] = {
>>>>>> +    { .type = CPU_GEN_HSX,
>>>>>> +      .cpu_id = 0x306f0, /* Family code: 6, Model number: 63 
>>>>>> (0x3f) */
>>>>>> +      .core_max = CORE_MAX_ON_HSX },
>>>>>> +    { .type = CPU_GEN_BRX,
>>>>>> +      .cpu_id = 0x406f0, /* Family code: 6, Model number: 79 
>>>>>> (0x4f) */
>>>>>> +      .core_max = CORE_MAX_ON_BDX },
>>>>>> +    { .type = CPU_GEN_SKX,
>>>>>> +      .cpu_id = 0x50650, /* Family code: 6, Model number: 85 
>>>>>> (0x55) */
>>>>>> +      .core_max = CORE_MAX_ON_SKX },
>>>>>> +};
>>>>>> +
>>>>>> +static const u32 config_table[DEFAULT_CHANNEL_NUMS + 1] = {
>>>>>> +    /* Die temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
>>>>>> +    HWMON_T_CRIT_HYST,
>>>>>> +
>>>>>> +    /* DTS margin temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MIN | HWMON_T_LCRIT,
>>>>>> +
>>>>>> +    /* Tcontrol temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
>>>>>> +
>>>>>> +    /* Tthrottle temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT,
>>>>>> +
>>>>>> +    /* Tjmax temperature */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT,
>>>>>> +
>>>>>> +    /* Core temperature - for all core channels */
>>>>>> +    HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
>>>>>> +    HWMON_T_CRIT_HYST,
>>>>>> +};
>>>>>> +
>>>>>> +static const char *cputemp_label[CPUTEMP_CHANNEL_NUMS] = {
>>>>>> +    "Die",
>>>>>> +    "DTS margin",
>>>>>> +    "Tcontrol",
>>>>>> +    "Tthrottle",
>>>>>> +    "Tjmax",
>>>>>> +    "Core 0", "Core 1", "Core 2", "Core 3",
>>>>>> +    "Core 4", "Core 5", "Core 6", "Core 7",
>>>>>> +    "Core 8", "Core 9", "Core 10", "Core 11",
>>>>>> +    "Core 12", "Core 13", "Core 14", "Core 15",
>>>>>> +    "Core 16", "Core 17", "Core 18", "Core 19",
>>>>>> +    "Core 20", "Core 21", "Core 22", "Core 23",
>>>>>> +};
>>>>>> +
>>>>>> +static int send_peci_cmd(struct peci_cputemp *priv,
>>>>>> +             enum peci_cmd cmd,
>>>>>> +             void *msg)
>>>>>> +{
>>>>>> +    return peci_command(priv->client->adapter, cmd, msg);
>>>>>> +}
>>>>>> +
>>>>>> +static int need_update(struct temp_data *temp)
>>>>>
>>>>> Please use bool.
>>>>>
>>>>
>>>> Okay. I'll use bool instead of int.
>>>>
>>>>>> +{
>>>>>> +    if (temp->valid &&
>>>>>> +        time_before(jiffies, temp->last_updated + 
>>>>>> UPDATE_INTERVAL_MIN))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 1;
>>>>>> +}
>>>>>> +
>>>>>> +static void mark_updated(struct temp_data *temp)
>>>>>> +{
>>>>>> +    temp->valid = true;
>>>>>> +    temp->last_updated = jiffies;
>>>>>> +}
>>>>>> +
>>>>>> +static s32 ten_dot_six_to_millidegree(s32 val)
>>>>>> +{
>>>>>> +    return ((val ^ 0x8000) - 0x8000) * 1000 / 64;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tjmax(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!priv->temp.tjmax.valid) {
>>>>>> +        msg.addr = priv->addr;
>>>>>> +        msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +        msg.param = 0;
>>>>>> +        msg.rx_len = 4;
>>>>>> +
>>>>>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        priv->temp.tjmax.value = (s32)msg.pkg_config[2] * 1000;
>>>>>> +        priv->temp.tjmax.valid = true;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tcontrol(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 tcontrol_margin;
>>>>>> +    s32 tthrottle_offset;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.tcontrol))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    tcontrol_margin = msg.pkg_config[1];
>>>>>> +    tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>>>>>> +    priv->temp.tcontrol.value = priv->temp.tjmax.value - 
>>>>>> tcontrol_margin;
>>>>>> +
>>>>>> +    tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>>>>>> +    priv->temp.tthrottle.value = priv->temp.tjmax.value - 
>>>>>> tthrottle_offset;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.tcontrol);
>>>>>> +    mark_updated(&priv->temp.tthrottle);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_tthrottle(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 tcontrol_margin;
>>>>>> +    s32 tthrottle_offset;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.tthrottle))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_TEMP_TARGET;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    tthrottle_offset = (msg.pkg_config[3] & 0x2f) * 1000;
>>>>>> +    priv->temp.tthrottle.value = priv->temp.tjmax.value - 
>>>>>> tthrottle_offset;
>>>>>> +
>>>>>> +    tcontrol_margin = msg.pkg_config[1];
>>>>>> +    tcontrol_margin = ((tcontrol_margin ^ 0x80) - 0x80) * 1000;
>>>>>> +    priv->temp.tcontrol.value = priv->temp.tjmax.value - 
>>>>>> tcontrol_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.tthrottle);
>>>>>> +    mark_updated(&priv->temp.tcontrol);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>
>>>>> I am quite completely missing how the two functions above are 
>>>>> different.
>>>>>
>>>>
>>>> The two above functions are slightly different but uses the same 
>>>> PECI command which provides both Tthrottle and Tcontrol values in 
>>>> pkg_config array so it updates the values to reduce duplicate PECI 
>>>> transactions. Probably, combining these two functions into 
>>>> get_ttrottle_and_tcontrol() would look better. I'll rewrite it.
>>>>
>>>>>> +
>>>>>> +static int get_die_temp(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_get_temp_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.die))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_GET_TEMP, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->temp.die.value = priv->temp.tjmax.value +
>>>>>> +                   ((s32)msg.temp_raw * 1000 / 64);
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.die);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_dts_margin(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 dts_margin;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.dts_margin))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_DTS_MARGIN;
>>>>>> +    msg.param = 0;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>>>> +
>>>>>> +    /**
>>>>>> +     * Processors return a value of DTS reading in 10.6 format
>>>>>> +     * (10 bits signed decimal, 6 bits fractional).
>>>>>> +     * Error codes:
>>>>>> +     *   0x8000: General sensor error
>>>>>> +     *   0x8001: Reserved
>>>>>> +     *   0x8002: Underflow on reading value
>>>>>> +     *   0x8003-0x81ff: Reserved
>>>>>> +     */
>>>>>> +    if (dts_margin >= 0x8000 && dts_margin <= 0x81ff)
>>>>>> +        return -EIO;
>>>>>> +
>>>>>> +    dts_margin = ten_dot_six_to_millidegree(dts_margin);
>>>>>> +
>>>>>> +    priv->temp.dts_margin.value = dts_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.dts_margin);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int get_core_temp(struct peci_cputemp *priv, int core_index)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    s32 core_dts_margin;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp.core[core_index]))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    rc = get_tjmax(priv);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_PER_CORE_DTS_TEMP;
>>>>>> +    msg.param = core_index;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    core_dts_margin = (msg.pkg_config[1] << 8) | msg.pkg_config[0];
>>>>>> +
>>>>>> +    /**
>>>>>> +     * Processors return a value of the core DTS reading in 10.6 
>>>>>> format
>>>>>> +     * (10 bits signed decimal, 6 bits fractional).
>>>>>> +     * Error codes:
>>>>>> +     *   0x8000: General sensor error
>>>>>> +     *   0x8001: Reserved
>>>>>> +     *   0x8002: Underflow on reading value
>>>>>> +     *   0x8003-0x81ff: Reserved
>>>>>> +     */
>>>>>> +    if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>>>>>> +        return -EIO;
>>>>>> +
>>>>>> +    core_dts_margin = ten_dot_six_to_millidegree(core_dts_margin);
>>>>>> +
>>>>>> +    priv->temp.core[core_index].value = priv->temp.tjmax.value +
>>>>>> +                        core_dts_margin;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp.core[core_index]);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>
>>>>> There is a lot of duplication in those functions. Would it be possible
>>>>> to find common code and use functions for it instead of duplicating
>>>>> everything several times ?
>>>>>
>>>>
>>>> Are you pointing out this code?
>>>> /**
>>>>   * Processors return a value of the core DTS reading in 10.6 format
>>>>   * (10 bits signed decimal, 6 bits fractional).
>>>>   * Error codes:
>>>>   *   0x8000: General sensor error
>>>>   *   0x8001: Reserved
>>>>   *   0x8002: Underflow on reading value
>>>>   *   0x8003-0x81ff: Reserved
>>>>   */
>>>> if (core_dts_margin >= 0x8000 && core_dts_margin <= 0x81ff)
>>>>      return -EIO;
>>>>
>>>> Then I'll rewrite it as a function. If not, please point out the 
>>>> duplication.
>>>>
>>>
>>> There is lots of other duplication.
>>>
>>
>> Sorry but can you point out the duplication?
>>
> write a python script to do a semantic comparison.
> 

Okay. I'll try to simplify this code again.

>>>>>> +static int find_core_index(struct peci_cputemp *priv, int channel)
>>>>>> +{
>>>>>> +    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
>>>>>> +    int idx, found = 0;
>>>>>> +
>>>>>> +    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
>>>>>> +        if (priv->core_mask & BIT(idx)) {
>>>>>> +            if (core_channel == found)
>>>>>> +                break;
>>>>>> +
>>>>>> +            found++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return idx;
>>>>>
>>>>> What if nothing is found ?
>>>>>
>>>>
>>>> Core temperature group will be registered only when it detects at 
>>>> least one core checked by check_resolved_cores(), so 
>>>> find_core_index() can be called only when priv->core_mask has a 
>>>> non-zero value. The 'nothing is found' case will not happen.
>>>>
>>> That doesn't guarantee a match. If what you are saying is correct 
>>> there should always be
>>> a well defined match of channel -> idx, and the search should be 
>>> unnecessary.
>>>
>>
>> There could be some disabled cores in the resolved core mask bit 
>> sequence also it should remove indexing gap in channel numbering so it 
>> is the reason why this search function is needed. Well defined match 
>> of channel -> idx would not be always satisfied.
>>
> Are you saying that each call to the function, with the same parameters,
> can return a different result ?
> 

No, the result will be consistent. After reading the priv->core_mask 
once in check_resolved_cores(), the value will not be changed. I'm 
saying about this case, for example if core number 2 is unresolved in 
total 4 cores, then the idx order will be '0, 1, 3' but channel order 
will be '5, 6, 7' without making any indexing gap.

>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_string(struct device *dev,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel, const char **str)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int core_index;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +        if (channel < DEFAULT_CHANNEL_NUMS) {
>>>>>> +            *str = cputemp_label[channel];
>>>>>> +        } else {
>>>>>> +            core_index = find_core_index(priv, channel);
>>>>>
>>>>> FWIW, it might be better to pass channel - DEFAULT_CHANNEL_NUMS
>>>>> as parameter.
>>>>>
>>>>
>>>> cputemp_read_string() is mapped to read_string member of hwmon_ops 
>>>> struct, so hwmon susbsystem passes the channel parameter based on 
>>>> the registered channel order. Should I modify hwmon subsystem code?
>>>>
>>>
>>> Huh ? Changing
>>>      f(x) { y = x - const; }
>>> ...
>>>      f(x);
>>>
>>> to
>>>      f(y) { }
>>> ...
>>>      f(x - const);
>>>
>>> requires a hwmon core change ? Really ?
>>>
>>
>> Sorry for my misunderstanding. You are right. I'll change the 
>> parameter passing of find_core_index() from 'channel' to 'channel - 
>> DEFAULT_CHANNEL_NUMS'.
>>
>>>>> What if find_core_index() returns priv->gen_info->core_max, ie
>>>>> if it didn't find a core ?
>>>>>
>>>>
>>>> As explained above, find_core index() returns a correct index always.
>>>>
>>>>>> +            *str = cputemp_label[DEFAULT_CHANNEL_NUMS + core_index];
>>>>>> +        }
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_die(struct device *dev,
>>>>>> +                enum hwmon_sensor_types type,
>>>>>> +                u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_die_temp(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.die.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_max:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit_hyst:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_dts_margin(struct device *dev,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_dts_margin(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.dts_margin.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_min:
>>>>>> +        *val = 0;
>>>>>> +        return 0;
>>>>>
>>>>> This attribute should not exist.
>>>>>
>>>>
>>>> This is an attribute of DTS margin temperature which reflects 
>>>> thermal margin to Tcontrol of the CPU package. If it shows '0' means 
>>>> it reached to Tcontrol, the first level of thermal warning. If the 
>>>> CPU keeps getting hot then this DTS margin shows a negative value 
>>>> until it reaches to Tjmax. When the temperature reaches to Tjmax at 
>>>> last then it shows the lower critcal value which lcrit indicates as 
>>>> the second level of thermal warning.
>>>>
>>>
>>> The hwmon ABI reports chip values, not constants. Even though some 
>>> drivers do
>>> it, reporting a constant is always wrong.
>>>
>>>>>> +    case hwmon_temp_lcrit:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value - priv->temp.tjmax.value;
>>>>>
>>>>> lcrit is tcontrol - tjmax, and crit_hyst above is
>>>>> tjmax - tcontrol ? How does this make sense ?
>>>>>
>>>>
>>>> Both Tjmax and Tcontrol have positive values and Tjmax is greater 
>>>> than Tcontrol always. As explained above, lcrit of DTS margin should 
>>>> show a negative value means the margin goes down across '0'. On the 
>>>> other hand, crit_hyst of Die temperature should show absolute 
>>>> hyterisis value between Tcontrol and Tjmax.
>>>>
>>> The hwmon ABI requires reporting of absolute temperatures in 
>>> milli-degrees C.
>>> Your statements make it very clear that this driver does not report
>>> absolute temperatures. This is not acceptable.
>>>
>>
>> Okay. I'll remove the 'DTS margin' temperature. All others are 
>> reporting absolute temperatures.
>>
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tcontrol(struct device *dev,
>>>>>> +                 enum hwmon_sensor_types type,
>>>>>> +                 u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>
>>>>> Am I missing something, or is the same temperature reported several 
>>>>> times ?
>>>>> tjmax is also reported as temp_crit cputemp_read_die(), for example.
>>>>>
>>>>
>>>> This driver provides multiple channels and each channel has its own 
>>>> supplement attributes. As you mentioned, Die temperature channel and 
>>>> Core temperature channel have their individual crit attributes and 
>>>> they reflect the same value, Tjmax. It is not reporting several 
>>>> times but reporting the same value.
>>>>
>>> Then maybe fold the functions accordingly ?
>>>
>>
>> I'll use a single function for 'Die temperature' and 'Core 
>> temperature' that have the same attributes set. It would simplify this 
>> code a bit.
>>
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tthrottle(struct device *dev,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tthrottle(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tthrottle.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_tjmax(struct device *dev,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int cputemp_read_core(struct device *dev,
>>>>>> +                 enum hwmon_sensor_types type,
>>>>>> +                 u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_cputemp *priv = dev_get_drvdata(dev);
>>>>>> +    int core_index = find_core_index(priv, channel);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_core_temp(priv, core_index);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.core[core_index].value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_max:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit:
>>>>>> +        rc = get_tjmax(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value;
>>>>>> +        return 0;
>>>>>> +    case hwmon_temp_crit_hyst:
>>>>>> +        rc = get_tcontrol(priv);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>
>>>>> There is again a lot of duplication in those functions.
>>>>>
>>>>
>>>> Each function is called from cputemp_read() which is mapped to read 
>>>> function pointer of hwmon_ops struct. Since each channel has 
>>>> different set of attributes so the cputemp_read() calls an 
>>>> individual channel handler after checking the channel type. Of 
>>>> course, we can handle all attributes of all channels in a single 
>>>> function but the way also needs channel type checking code on each 
>>>> attribute.
>>>>
>>>>>> +
>>>>>> +static int cputemp_read(struct device *dev,
>>>>>> +            enum hwmon_sensor_types type,
>>>>>> +            u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    switch (channel) {
>>>>>> +    case channel_die:
>>>>>> +        return cputemp_read_die(dev, type, attr, channel, val);
>>>>>> +    case channel_dts_mrgn:
>>>>>> +        return cputemp_read_dts_margin(dev, type, attr, channel, 
>>>>>> val);
>>>>>> +    case channel_tcontrol:
>>>>>> +        return cputemp_read_tcontrol(dev, type, attr, channel, val);
>>>>>> +    case channel_tthrottle:
>>>>>> +        return cputemp_read_tthrottle(dev, type, attr, channel, 
>>>>>> val);
>>>>>> +    case channel_tjmax:
>>>>>> +        return cputemp_read_tjmax(dev, type, attr, channel, val);
>>>>>> +    default:
>>>>>> +        if (channel < CPUTEMP_CHANNEL_NUMS)
>>>>>> +            return cputemp_read_core(dev, type, attr, channel, val);
>>>>>> +
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static umode_t cputemp_is_visible(const void *data,
>>>>>> +                  enum hwmon_sensor_types type,
>>>>>> +                  u32 attr, int channel)
>>>>>> +{
>>>>>> +    const struct peci_cputemp *priv = data;
>>>>>> +
>>>>>> +    if (priv->temp_config[channel] & BIT(attr))
>>>>>> +        return 0444;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct hwmon_ops cputemp_ops = {
>>>>>> +    .is_visible = cputemp_is_visible,
>>>>>> +    .read_string = cputemp_read_string,
>>>>>> +    .read = cputemp_read,
>>>>>> +};
>>>>>> +
>>>>>> +static int check_resolved_cores(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pci_cfg_local_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!(priv->client->adapter->cmd_mask & 
>>>>>> BIT(PECI_CMD_RD_PCI_CFG_LOCAL)))
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    /* Get the RESOLVED_CORES register value */
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.bus = 1;
>>>>>> +    msg.device = 30;
>>>>>> +    msg.function = 3;
>>>>>> +    msg.reg = 0xB4;
>>>>>
>>>>> Can this be made less magic with some defines ?
>>>>>
>>>>
>>>> Sure, will use defines instead.
>>>>
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PCI_CFG_LOCAL, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->core_mask = msg.pci_config[3] << 24 |
>>>>>> +              msg.pci_config[2] << 16 |
>>>>>> +              msg.pci_config[1] << 8 |
>>>>>> +              msg.pci_config[0];
>>>>>> +
>>>>>> +    if (!priv->core_mask)
>>>>>> +        return -EAGAIN;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "Scanned resolved cores: 0x%x\n", 
>>>>>> priv->core_mask);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int create_core_temp_info(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    int rc, i;
>>>>>> +
>>>>>> +    rc = check_resolved_cores(priv);
>>>>>> +    if (!rc) {
>>>>>> +        for (i = 0; i < priv->gen_info->core_max; i++) {
>>>>>> +            if (priv->core_mask & BIT(i)) {
>>>>>> +                priv->temp_config[priv->config_idx++] =
>>>>>> +                             config_table[channel_core];
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static int check_cpu_id(struct peci_cputemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    u32 cpu_id;
>>>>>> +    int i, rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_CPU_ID;
>>>>>> +    msg.param = PKG_ID_CPU_ID;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    cpu_id = ((msg.pkg_config[2] << 16) | (msg.pkg_config[1] << 8) |
>>>>>> +          msg.pkg_config[0]) & CLIENT_CPU_ID_MASK;
>>>>>> +
>>>>>> +    for (i = 0; i < CPU_GEN_MAX; i++) {
>>>>>> +        if (cpu_id == cpu_gen_info_table[i].cpu_id) {
>>>>>> +            priv->gen_info = &cpu_gen_info_table[i];
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->gen_info)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "CPU_ID: 0x%x\n", cpu_id);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int peci_cputemp_probe(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct device *dev = &client->dev;
>>>>>> +    struct peci_cputemp *priv;
>>>>>> +    struct device *hwmon_dev;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if ((client->adapter->cmd_mask &
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>>>>>> +        dev_err(dev, "Client doesn't support temperature 
>>>>>> monitoring\n");
>>>>>> +        return -EINVAL;
>>>>>
>>>>> Does this mean there will be an error message for each 
>>>>> non-supported CPU ?
>>>>> Why ?
>>>>>
>>>>
>>>> For proper operation of this driver, PECI_CMD_GET_TEMP and 
>>>> PECI_CMD_RD_PKG_CFG have to be supported by a client CPU. 
>>>> PECI_CMD_GET_TEMP is provided as a default command but 
>>>> PECI_CMD_RD_PKG_CFG depends on PECI minor revision of a CPU package 
>>>> so this checking is needed.
>>>>
>>>
>>> I do not question the check. I question the error message and error 
>>> return value.
>>> Why is it an _error_ if the CPU does not support the functionality, 
>>> and why does
>>> it have to be reported in the kernel log ?
>>>
>>
>> Got it. I'll change that to dev_dbg.
>>
>>>>>> +    }
>>>>>> +
>>>>>> +    priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>>>>> +    if (!priv)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    dev_set_drvdata(dev, priv);
>>>>>> +    priv->client = client;
>>>>>> +    priv->dev = dev;
>>>>>> +    priv->addr = client->addr;
>>>>>> +    priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>>>>>> +
>>>>>> +    snprintf(priv->name, PECI_NAME_SIZE, "peci_cputemp.cpu%d",
>>>>>> +         priv->cpu_no);
>>>>>> +
>>>>>> +    rc = check_cpu_id(priv);
>>>>>> +    if (rc) {
>>>>>> +        dev_err(dev, "Client CPU is not supported\n");
>>>>>
>>>>> -ENODEV is not an error, and should not result in an error message.
>>>>> Besides, the error can also be propagated from peci core code,
>>>>> and may well be something else.
>>>>>
>>>>
>>>> Got it. I'll remove the error message and will add a proper handling 
>>>> code into PECI core.
>>>>
>>>>>> +        return rc;
>>>>>> +    }
>>>>>> +
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_die];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_dts_mrgn];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tcontrol];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tthrottle];
>>>>>> +    priv->temp_config[priv->config_idx++] = 
>>>>>> config_table[channel_tjmax];
>>>>>> +
>>>>>> +    rc = create_core_temp_info(priv);
>>>>>> +    if (rc)
>>>>>> +        dev_dbg(dev, "Failed to create core temp info\n");
>>>>>
>>>>> Then what ? Shouldn't this result in probe deferral or something 
>>>>> more useful
>>>>> instead of just being ignored ?
>>>>>
>>>>
>>>> This driver can't support core temperature monitoring if a CPU 
>>>> doesn't support PECI_CMD_RD_PCI_CFG_LOCAL command. In that case, it 
>>>> skips core temperature group creation and supports only basic 
>>>> temperature monitoring of Die, DTS margin and etc. I'll add this 
>>>> description as a comment.
>>>>
>>>
>>> The message says "Failed to ...". It does not say "This CPU does not 
>>> support ...".
>>>
>>
>> Got it. Will correct the message.
>>
>>>>>> +
>>>>>> +    priv->chip.ops = &cputemp_ops;
>>>>>> +    priv->chip.info = priv->info;
>>>>>> +
>>>>>> +    priv->info[0] = &priv->temp_info;
>>>>>> +
>>>>>> +    priv->temp_info.type = hwmon_temp;
>>>>>> +    priv->temp_info.config = priv->temp_config;
>>>>>> +
>>>>>> +    hwmon_dev = devm_hwmon_device_register_with_info(priv->dev,
>>>>>> +                             priv->name,
>>>>>> +                             priv,
>>>>>> +                             &priv->chip,
>>>>>> +                             NULL);
>>>>>> +
>>>>>> +    if (IS_ERR(hwmon_dev))
>>>>>> +        return PTR_ERR(hwmon_dev);
>>>>>> +
>>>>>> +    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev), 
>>>>>> priv->name);
>>>>>> +
>>>
>>> Why does this message display the device name twice ?
>>>
>>
>> For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name 
>> shows 'peci-cputemp0'.
>>
> And dev_dbg() shows another device name. So you'll have something like
> 
> peci-cputemp0: hwmon5: sensor 'peci-cputemp0'
> 

Practically it shows like

peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'

where 0-30:00 is assigned by peci core.

>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct of_device_id peci_cputemp_of_table[] = {
>>>>>> +    { .compatible = "intel,peci-cputemp" },
>>>>>> +    { }
>>>>>> +};
>>>>>> +MODULE_DEVICE_TABLE(of, peci_cputemp_of_table);
>>>>>> +
>>>>>> +static struct peci_driver peci_cputemp_driver = {
>>>>>> +    .probe  = peci_cputemp_probe,
>>>>>> +    .driver = {
>>>>>> +        .name           = "peci-cputemp",
>>>>>> +        .of_match_table = of_match_ptr(peci_cputemp_of_table),
>>>>>> +    },
>>>>>> +};
>>>>>> +module_peci_driver(peci_cputemp_driver);
>>>>>> +
>>>>>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@...ux.intel.com>");
>>>>>> +MODULE_DESCRIPTION("PECI cputemp driver");
>>>>>> +MODULE_LICENSE("GPL v2");
>>>>>> diff --git a/drivers/hwmon/peci-dimmtemp.c 
>>>>>> b/drivers/hwmon/peci-dimmtemp.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..78bf29cb2c4c
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/hwmon/peci-dimmtemp.c
>>>>>
>>>>> FWIW, this should be two separate patches.
>>>>>
>>>>
>>>> Should I split out hwmon documents and dt bindings too?
>>>>
>>>>>> @@ -0,0 +1,432 @@
>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>> +// Copyright (c) 2018 Intel Corporation
>>>>>> +
>>>>>> +#include <linux/delay.h>
>>>>>> +#include <linux/hwmon.h>
>>>>>> +#include <linux/hwmon-sysfs.h>
>>>>>
>>>>> Needed ?
>>>>>
>>>>
>>>> No. Will drop the line.
>>>>
>>>>>> +#include <linux/jiffies.h>
>>>>>> +#include <linux/module.h>
>>>>>> +#include <linux/of_device.h>
>>>>>> +#include <linux/peci.h>
>>>>>> +#include <linux/workqueue.h>
>>>>>> +
>>>>>> +#define TEMP_TYPE_PECI       6  /* Sensor type 6: Intel PECI */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_HSX 8  /* Max number of channel ranks on 
>>>>>> Haswell */
>>>>>> +#define DIMM_IDX_MAX_ON_HSX  3  /* Max DIMM index per channel on 
>>>>>> Haswell */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_BDX 4  /* Max number of channel ranks on 
>>>>>> Broadwell */
>>>>>> +#define DIMM_IDX_MAX_ON_BDX  3  /* Max DIMM index per channel on 
>>>>>> Broadwell */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX_ON_SKX 6  /* Max number of channel ranks on 
>>>>>> Skylake */
>>>>>> +#define DIMM_IDX_MAX_ON_SKX  2  /* Max DIMM index per channel on 
>>>>>> Skylake */
>>>>>> +
>>>>>> +#define CHAN_RANK_MAX        CHAN_RANK_MAX_ON_HSX
>>>>>> +#define DIMM_IDX_MAX         DIMM_IDX_MAX_ON_HSX
>>>>>> +
>>>>>> +#define DIMM_NUMS_MAX        (CHAN_RANK_MAX * DIMM_IDX_MAX)
>>>>>> +
>>>>>> +#define CLIENT_CPU_ID_MASK   0xf0ff0  /* Mask for Family / Model 
>>>>>> info */
>>>>>> +
>>>>>> +#define UPDATE_INTERVAL_MIN  HZ
>>>>>> +
>>>>>> +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
>>>>>> +#define DIMM_MASK_CHECK_RETRY_MAX     60 /* 60 x 5 secs = 5 
>>>>>> minutes */
>>>>>> +
>>>>>> +enum cpu_gens {
>>>>>> +    CPU_GEN_HSX, /* Haswell Xeon */
>>>>>> +    CPU_GEN_BRX, /* Broadwell Xeon */
>>>>>> +    CPU_GEN_SKX, /* Skylake Xeon */
>>>>>> +    CPU_GEN_MAX
>>>>>> +};
>>>>>> +
>>>>>> +struct cpu_gen_info {
>>>>>> +    u32 type;
>>>>>> +    u32 cpu_id;
>>>>>> +    u32 chan_rank_max;
>>>>>> +    u32 dimm_idx_max;
>>>>>> +};
>>>>>> +
>>>>>> +struct temp_data {
>>>>>> +    bool valid;
>>>>>> +    s32  value;
>>>>>> +    unsigned long last_updated;
>>>>>> +};
>>>>>> +
>>>>>> +struct peci_dimmtemp {
>>>>>> +    struct peci_client *client;
>>>>>> +    struct device *dev;
>>>>>> +    struct workqueue_struct *work_queue;
>>>>>> +    struct delayed_work work_handler;
>>>>>> +    char name[PECI_NAME_SIZE];
>>>>>> +    struct temp_data temp[DIMM_NUMS_MAX];
>>>>>> +    u8 addr;
>>>>>> +    uint cpu_no;
>>>>>> +    const struct cpu_gen_info *gen_info;
>>>>>> +    u32 dimm_mask;
>>>>>> +    int retry_count;
>>>>>> +    int channels;
>>>>>> +    u32 temp_config[DIMM_NUMS_MAX + 1];
>>>>>> +    struct hwmon_channel_info temp_info;
>>>>>> +    const struct hwmon_channel_info *info[2];
>>>>>> +    struct hwmon_chip_info chip;
>>>>>> +};
>>>>>> +
>>>>>> +static const struct cpu_gen_info cpu_gen_info_table[] = {
>>>>>> +    { .type  = CPU_GEN_HSX,
>>>>>> +      .cpu_id = 0x306f0, /* Family code: 6, Model number: 63 
>>>>>> (0x3f) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_HSX },
>>>>>> +    { .type  = CPU_GEN_BRX,
>>>>>> +      .cpu_id = 0x406f0, /* Family code: 6, Model number: 79 
>>>>>> (0x4f) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_BDX },
>>>>>> +    { .type  = CPU_GEN_SKX,
>>>>>> +      .cpu_id = 0x50650, /* Family code: 6, Model number: 85 
>>>>>> (0x55) */
>>>>>> +      .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
>>>>>> +      .dimm_idx_max  = DIMM_IDX_MAX_ON_SKX },
>>>>>> +};
>>>>>> +
>>>>>> +static const char *dimmtemp_label[CHAN_RANK_MAX][DIMM_IDX_MAX] = {
>>>>>> +    { "DIMM A0", "DIMM A1", "DIMM A2" },
>>>>>> +    { "DIMM B0", "DIMM B1", "DIMM B2" },
>>>>>> +    { "DIMM C0", "DIMM C1", "DIMM C2" },
>>>>>> +    { "DIMM D0", "DIMM D1", "DIMM D2" },
>>>>>> +    { "DIMM E0", "DIMM E1", "DIMM E2" },
>>>>>> +    { "DIMM F0", "DIMM F1", "DIMM F2" },
>>>>>> +    { "DIMM G0", "DIMM G1", "DIMM G2" },
>>>>>> +    { "DIMM H0", "DIMM H1", "DIMM H2" },
>>>>>> +};
>>>>>> +
>>>>>> +static int send_peci_cmd(struct peci_dimmtemp *priv, enum 
>>>>>> peci_cmd cmd,
>>>>>> +             void *msg)
>>>>>> +{
>>>>>> +    return peci_command(priv->client->adapter, cmd, msg);
>>>>>> +}
>>>>>> +
>>>>>> +static int need_update(struct temp_data *temp)
>>>>>> +{
>>>>>> +    if (temp->valid &&
>>>>>> +        time_before(jiffies, temp->last_updated + 
>>>>>> UPDATE_INTERVAL_MIN))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 1;
>>>>>> +}
>>>>>> +
>>>>>> +static void mark_updated(struct temp_data *temp)
>>>>>> +{
>>>>>> +    temp->valid = true;
>>>>>> +    temp->last_updated = jiffies;
>>>>>> +}
>>>>>
>>>>> It might make sense to provide the duplicate functions in a core file.
>>>>>
>>>>
>>>> It is temperature monitoring specific function and it touches module 
>>>> specific variables. Do you really think that this non-generic 
>>>> function should be moved to PECI core?
>>>>
>>>>>> +
>>>>>> +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
>>>>>> +{
>>>>>> +    int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
>>>>>> +    int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if (!need_update(&priv->temp[dimm_no]))
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>>>> +    msg.param = chan_rank;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    priv->temp[dimm_no].value = msg.pkg_config[dimm_order] * 1000;
>>>>>> +
>>>>>> +    mark_updated(&priv->temp[dimm_no]);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int find_dimm_number(struct peci_dimmtemp *priv, int channel)
>>>>>> +{
>>>>>> +    int dimm_nums_max = priv->gen_info->chan_rank_max *
>>>>>> +                priv->gen_info->dimm_idx_max;
>>>>>> +    int idx, found = 0;
>>>>>> +
>>>>>> +    for (idx = 0; idx < dimm_nums_max; idx++) {
>>>>>> +        if (priv->dimm_mask & BIT(idx)) {
>>>>>> +            if (channel == found)
>>>>>> +                break;
>>>>>> +
>>>>>> +            found++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return idx;
>>>>>> +}
>>>>>
>>>>> This again looks like duplicate code.
>>>>>
>>>>
>>>> find_dimm_number()? I'm sure it isn't.
>>>>
>>>>>> +
>>>>>> +static int dimmtemp_read_string(struct device *dev,
>>>>>> +                enum hwmon_sensor_types type,
>>>>>> +                u32 attr, int channel, const char **str)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>>>>> +    u32 dimm_idx_max = priv->gen_info->dimm_idx_max;
>>>>>> +    int dimm_no, chan_rank, dimm_idx;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +        dimm_no = find_dimm_number(priv, channel);
>>>>>> +        chan_rank = dimm_no / dimm_idx_max;
>>>>>> +        dimm_idx = dimm_no % dimm_idx_max;
>>>>>> +        *str = dimmtemp_label[chan_rank][dimm_idx];
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static int dimmtemp_read(struct device *dev, enum 
>>>>>> hwmon_sensor_types type,
>>>>>> +             u32 attr, int channel, long *val)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>>>>> +    int dimm_no = find_dimm_number(priv, channel);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_input:
>>>>>> +        rc = get_dimm_temp(priv, dimm_no);
>>>>>> +        if (rc)
>>>>>> +            return rc;
>>>>>> +
>>>>>> +        *val = priv->temp[dimm_no].value;
>>>>>> +        return 0;
>>>>>> +    default:
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static umode_t dimmtemp_is_visible(const void *data,
>>>>>> +                   enum hwmon_sensor_types type,
>>>>>> +                   u32 attr, int channel)
>>>>>> +{
>>>>>> +    switch (attr) {
>>>>>> +    case hwmon_temp_label:
>>>>>> +    case hwmon_temp_input:
>>>>>> +        return 0444;
>>>>>> +    default:
>>>>>> +        return 0;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +static const struct hwmon_ops dimmtemp_ops = {
>>>>>> +    .is_visible = dimmtemp_is_visible,
>>>>>> +    .read_string = dimmtemp_read_string,
>>>>>> +    .read = dimmtemp_read,
>>>>>> +};
>>>>>> +
>>>>>> +static int check_populated_dimms(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    u32 chan_rank_max = priv->gen_info->chan_rank_max;
>>>>>> +    u32 dimm_idx_max = priv->gen_info->dimm_idx_max;
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    int chan_rank, dimm_idx;
>>>>>> +    int rc, channels = 0;
>>>>>> +
>>>>>> +    for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
>>>>>> +        msg.addr = priv->addr;
>>>>>> +        msg.index = MBX_INDEX_DDR_DIMM_TEMP;
>>>>>> +        msg.param = chan_rank;
>>>>>> +        msg.rx_len = 4;
>>>>>> +
>>>>>> +        rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +        if (rc) {
>>>>>> +            priv->dimm_mask = 0;
>>>>>> +            return rc;
>>>>>> +        }
>>>>>> +
>>>>>> +        for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++) {
>>>>>> +            if (msg.pkg_config[dimm_idx]) {
>>>>>> +                priv->dimm_mask |= BIT(chan_rank *
>>>>>> +                               chan_rank_max +
>>>>>> +                               dimm_idx);
>>>>>> +                channels++;
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->dimm_mask)
>>>>>> +        return -EAGAIN;
>>>>>> +
>>>>>> +    priv->channels = channels;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "Scanned populated DIMMs: 0x%x\n", 
>>>>>> priv->dimm_mask);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int create_dimm_temp_info(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    struct device *hwmon_dev;
>>>>>> +    int rc, i;
>>>>>> +
>>>>>> +    rc = check_populated_dimms(priv);
>>>>>> +    if (!rc) {
>>>>>
>>>>> Please handle error cases first.
>>>>>
>>>>
>>>> Sure, I'll rewrite it.
>>>>
>>>>>> +        for (i = 0; i < priv->channels; i++)
>>>>>> +            priv->temp_config[i] = HWMON_T_LABEL | HWMON_T_INPUT;
>>>>>> +
>>>>>> +        priv->chip.ops = &dimmtemp_ops;
>>>>>> +        priv->chip.info = priv->info;
>>>>>> +
>>>>>> +        priv->info[0] = &priv->temp_info;
>>>>>> +
>>>>>> +        priv->temp_info.type = hwmon_temp;
>>>>>> +        priv->temp_info.config = priv->temp_config;
>>>>>> +
>>>>>> +        hwmon_dev = devm_hwmon_device_register_with_info(priv->dev,
>>>>>> +                                 priv->name,
>>>>>> +                                 priv,
>>>>>> +                                 &priv->chip,
>>>>>> +                                 NULL);
>>>>>> +        rc = PTR_ERR_OR_ZERO(hwmon_dev);
>>>>>> +        if (!rc)
>>>>>> +            dev_dbg(priv->dev, "%s: sensor '%s'\n",
>>>>>> +                dev_name(hwmon_dev), priv->name);
>>>>>> +    } else if (rc == -EAGAIN) {
>>>>>> +        if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
>>>>>> +            queue_delayed_work(priv->work_queue,
>>>>>> +                       &priv->work_handler,
>>>>>> +                       DIMM_MASK_CHECK_DELAY_JIFFIES);
>>>>>> +            priv->retry_count++;
>>>>>> +            dev_dbg(priv->dev,
>>>>>> +                "Deferred DIMM temp info creation\n");
>>>>>> +        } else {
>>>>>> +            rc = -ETIMEDOUT;
>>>>>> +            dev_err(priv->dev,
>>>>>> +                "Timeout retrying DIMM temp info creation\n");
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static void create_dimm_temp_info_delayed(struct work_struct *work)
>>>>>> +{
>>>>>> +    struct delayed_work *dwork = to_delayed_work(work);
>>>>>> +    struct peci_dimmtemp *priv = container_of(dwork, struct 
>>>>>> peci_dimmtemp,
>>>>>> +                          work_handler);
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    rc = create_dimm_temp_info(priv);
>>>>>> +    if (rc && rc != -EAGAIN)
>>>>>> +        dev_dbg(priv->dev, "Failed to create DIMM temp info\n");
>>>>>> +}
>>>>>> +
>>>>>> +static int check_cpu_id(struct peci_dimmtemp *priv)
>>>>>> +{
>>>>>> +    struct peci_rd_pkg_cfg_msg msg;
>>>>>> +    u32 cpu_id;
>>>>>> +    int i, rc;
>>>>>> +
>>>>>> +    msg.addr = priv->addr;
>>>>>> +    msg.index = MBX_INDEX_CPU_ID;
>>>>>> +    msg.param = PKG_ID_CPU_ID;
>>>>>> +    msg.rx_len = 4;
>>>>>> +
>>>>>> +    rc = send_peci_cmd(priv, PECI_CMD_RD_PKG_CFG, &msg);
>>>>>> +    if (rc)
>>>>>> +        return rc;
>>>>>> +
>>>>>> +    cpu_id = ((msg.pkg_config[2] << 16) | (msg.pkg_config[1] << 8) |
>>>>>> +          msg.pkg_config[0]) & CLIENT_CPU_ID_MASK;
>>>>>> +
>>>>>> +    for (i = 0; i < CPU_GEN_MAX; i++) {
>>>>>> +        if (cpu_id == cpu_gen_info_table[i].cpu_id) {
>>>>>> +            priv->gen_info = &cpu_gen_info_table[i];
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    if (!priv->gen_info)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    dev_dbg(priv->dev, "CPU_ID: 0x%x\n", cpu_id);
>>>>>> +    return 0;
>>>>>> +}
>>>>>
>>>>> More duplicate code.
>>>>>
>>>>
>>>> Okay. In case of check_cpu_id(), it could be used as a generic PECI 
>>>> function. I'll move it into PECI core.
>>>>
>>>>>> +
>>>>>> +static int peci_dimmtemp_probe(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct device *dev = &client->dev;
>>>>>> +    struct peci_dimmtemp *priv;
>>>>>> +    int rc;
>>>>>> +
>>>>>> +    if ((client->adapter->cmd_mask &
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) !=
>>>>>> +        (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG))) {
>>>>>
>>>>> One set of ( ) is unnecessary on each side of the expression.
>>>>>
>>>>
>>>> '&' has a precedence over '!=' but '|' doesn't. I'll rewrite it to:
>>>>
>>>
>>> Actually, that is wrong. You refer to address-of. Bit operations do 
>>> have lower
>>> precedence that comparisons. I stand corrected.
>>>
>>>>      if (client->adapter->cmd_mask &
>>>>          (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG)) !=
>>>>          (BIT(PECI_CMD_GET_TEMP) | BIT(PECI_CMD_RD_PKG_CFG)))
>>>>
>>>>>> +        dev_err(dev, "Client doesn't support temperature 
>>>>>> monitoring\n");
>>>>>> +        return -EINVAL;
>>>>>
>>>>> Why is this "invalid", and why does it warrant an error message ?
>>>>>
>>>>
>>>> Should I use -EPERM? Any suggestion?
>>>>
>>>
>>> Is it an _error_ if the CPU does not support this functionality ?
>>>
>>
>> Actually, it returns from this probe() function without making any 
>> hwmon info creation so I intended to handle this case as an error. Am 
>> I wrong?
>>
> 
> If the functionality or HW supported by the driver isn't available, it 
> is customary
> to return -ENODEV and no error message. Otherwise the kernel log would 
> drown in
> "not supported" error messages. I don't see where it would add any value 
> to handle
> this driver differently.
> 
> EINVAL    Invalid argument
> EPERM    Operation not permitted
> 
> You'll have to work hard to convince me that any of those makes sense, 
> and that
> 
> ENODEV    No such device
> 
> doesn't. More specifically, if EINVAL makes sense, the caller did 
> something wrong,
> meaning there is a problem in the infrastructure which should get fixed.
> The same is true for EPERM.
> 

Now I fully understood what you pointed out. Thanks for the detailed 
explanation. I'll change the error return value to -ENODEV and will use 
dev_dbg for the message printing. Thanks!

>>>>>> +    }
>>>>>> +
>>>>>> +    priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>>>>> +    if (!priv)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    dev_set_drvdata(dev, priv);
>>>>>> +    priv->client = client;
>>>>>> +    priv->dev = dev;
>>>>>> +    priv->addr = client->addr;
>>>>>> +    priv->cpu_no = priv->addr - PECI_BASE_ADDR;
>>>>>
>>>>> Is priv->addr guaranteed to be >= PECI_BASE_ADDR ?
>>>>
>>>> Client address range validation will be done in 
>>>> peci_check_addr_validity() in PECI core before probing a device driver.
>>>>
>>>>>> +
>>>>>> +    snprintf(priv->name, PECI_NAME_SIZE, "peci_dimmtemp.cpu%d",
>>>>>> +         priv->cpu_no);
>>>>>> +
>>>>>> +    rc = check_cpu_id(priv);
>>>>>> +    if (rc) {
>>>>>> +        dev_err(dev, "Client CPU is not supported\n");
>>>>>
>>>>> Or the peci command failed.
>>>>>
>>>>
>>>> I'll remove the error message and will add a proper handling code 
>>>> into PECI core on each error type.
>>>>
>>>>>> +        return rc;
>>>>>> +    }
>>>>>> +
>>>>>> +    priv->work_queue = alloc_ordered_workqueue(priv->name, 0);
>>>>>> +    if (!priv->work_queue)
>>>>>> +        return -ENOMEM;
>>>>>> +
>>>>>> +    INIT_DELAYED_WORK(&priv->work_handler, 
>>>>>> create_dimm_temp_info_delayed);
>>>>>> +
>>>>>> +    rc = create_dimm_temp_info(priv);
>>>>>> +    if (rc && rc != -EAGAIN) {
>>>>>> +        dev_err(dev, "Failed to create DIMM temp info\n");
>>>>>> +        goto err_free_wq;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +
>>>>>> +err_free_wq:
>>>>>> +    destroy_workqueue(priv->work_queue);
>>>>>> +    return rc;
>>>>>> +}
>>>>>> +
>>>>>> +static int peci_dimmtemp_remove(struct peci_client *client)
>>>>>> +{
>>>>>> +    struct peci_dimmtemp *priv = dev_get_drvdata(&client->dev);
>>>>>> +
>>>>>> +    cancel_delayed_work(&priv->work_handler);
>>>>>
>>>>> cancel_delayed_work_sync() ?
>>>>>
>>>>
>>>> Yes, it would be safer. Will fix it.
>>>>
>>>>>> +    destroy_workqueue(priv->work_queue);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct of_device_id peci_dimmtemp_of_table[] = {
>>>>>> +    { .compatible = "intel,peci-dimmtemp" },
>>>>>> +    { }
>>>>>> +};
>>>>>> +MODULE_DEVICE_TABLE(of, peci_dimmtemp_of_table);
>>>>>> +
>>>>>> +static struct peci_driver peci_dimmtemp_driver = {
>>>>>> +    .probe  = peci_dimmtemp_probe,
>>>>>> +    .remove = peci_dimmtemp_remove,
>>>>>> +    .driver = {
>>>>>> +        .name           = "peci-dimmtemp",
>>>>>> +        .of_match_table = of_match_ptr(peci_dimmtemp_of_table),
>>>>>> +    },
>>>>>> +};
>>>>>> +module_peci_driver(peci_dimmtemp_driver);
>>>>>> +
>>>>>> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@...ux.intel.com>");
>>>>>> +MODULE_DESCRIPTION("PECI dimmtemp driver");
>>>>>> +MODULE_LICENSE("GPL v2");
>>>>>> -- 
>>>>>> 2.16.2
>>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-hwmon" in
>>>> the body of a message to majordomo@...r.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>