[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e62b6763-0d1b-3359-6d3b-cb31e96bb862@roeck-us.net>
Date: Wed, 30 Oct 2019 19:20:37 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Akinobu Mita <akinobu.mita@...il.com>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
LKML <linux-kernel@...r.kernel.org>,
linux-nvme@...ts.infradead.org,
Linux PM <linux-pm@...r.kernel.org>,
Chris Healy <cphealy@...il.com>
Subject: Re: [PATCH v2] nvme: Add hardware monitoring support
On 10/30/19 4:16 AM, Akinobu Mita wrote:
> 2019年10月30日(水) 7:32 Guenter Roeck <linux@...ck-us.net>:
>>
>> nvme devices report temperature information in the controller information
>> (for limits) and in the smart log. Currently, the only means to retrieve
>> this information is the nvme command line interface, which requires
>> super-user privileges.
>>
>> At the same time, it would be desirable to use NVME temperature information
>> for thermal control.
>>
>> This patch adds support to read NVME temperatures from the kernel using the
>> hwmon API and adds temperature zones for NVME drives. The thermal subsystem
>> can use this information to set thermal policies, and userspace can access
>> it using libsensors and/or the "sensors" command.
>>
>> Example output from the "sensors" command:
>>
>> nvme0-pci-0100
>> Adapter: PCI adapter
>> Composite: +39.0°C (high = +85.0°C, crit = +85.0°C)
>> Sensor 1: +39.0°C
>> Sensor 2: +41.0°C
>>
>> Signed-off-by: Guenter Roeck <linux@...ck-us.net>
>> ---
>> v2: Use devm_kfree() to release memory in error path
>>
>> drivers/nvme/host/Kconfig | 10 ++
>> drivers/nvme/host/Makefile | 1 +
>> drivers/nvme/host/core.c | 5 +
>> drivers/nvme/host/nvme-hwmon.c | 163 +++++++++++++++++++++++++++++++++
>> drivers/nvme/host/nvme.h | 8 ++
>> 5 files changed, 187 insertions(+)
>> create mode 100644 drivers/nvme/host/nvme-hwmon.c
>>
>> diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
>> index 2b36f052bfb9..aeb49e16e386 100644
>> --- a/drivers/nvme/host/Kconfig
>> +++ b/drivers/nvme/host/Kconfig
>> @@ -23,6 +23,16 @@ config NVME_MULTIPATH
>> /dev/nvmeXnY device will show up for each NVMe namespaces,
>> even if it is accessible through multiple controllers.
>>
>> +config NVME_HWMON
>> + bool "NVME hardware monitoring"
>> + depends on (NVME_CORE=y && HWMON=y) || (NVME_CORE=m && HWMON)
>> + help
>> + This provides support for NVME hardware monitoring. If enabled,
>> + a hardware monitoring device will be created for each NVME drive
>> + in the system.
>> +
>> + If unsure, say N.
>> +
>> config NVME_FABRICS
>> tristate
>>
>> diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
>> index 8a4b671c5f0c..03de4797a877 100644
>> --- a/drivers/nvme/host/Makefile
>> +++ b/drivers/nvme/host/Makefile
>> @@ -14,6 +14,7 @@ nvme-core-$(CONFIG_TRACING) += trace.o
>> nvme-core-$(CONFIG_NVME_MULTIPATH) += multipath.o
>> nvme-core-$(CONFIG_NVM) += lightnvm.o
>> nvme-core-$(CONFIG_FAULT_INJECTION_DEBUG_FS) += fault_inject.o
>> +nvme-core-$(CONFIG_NVME_HWMON) += nvme-hwmon.o
>>
>> nvme-y += pci.o
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index fa7ba09dca77..fc1d4b146717 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -2796,6 +2796,9 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>> ctrl->oncs = le16_to_cpu(id->oncs);
>> ctrl->mtfa = le16_to_cpu(id->mtfa);
>> ctrl->oaes = le32_to_cpu(id->oaes);
>> + ctrl->wctemp = le16_to_cpu(id->wctemp);
>> + ctrl->cctemp = le16_to_cpu(id->cctemp);
>> +
>> atomic_set(&ctrl->abort_limit, id->acl + 1);
>> ctrl->vwc = id->vwc;
>> if (id->mdts)
>> @@ -2897,6 +2900,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>>
>> ctrl->identified = true;
>>
>> + nvme_hwmon_init(ctrl);
>> +
>> return 0;
>>
>> out_free:
>
> The nvme_init_identify() can be called multiple time in nvme ctrl's
> lifetime (e.g 'nvme reset /dev/nvme*' or suspend/resume paths), so
> should we need to prevent nvme_hwmon_init() from registering hwmon
> device more than twice?
>
> In the nvme thermal zone patchset[1], thernal zone is registered in
> nvme_init_identify and unregistered in nvme_stop_ctrl().
>
Doesn't that mean that the initialization should happen in nvme_start_ctrl()
and not here ?
Either case, good point. Reason for calling the init function from here
is that I wanted to ensure that it is called after controller
identification. But then it is really undesirable to re-instantiate
the driver on each device reset. I'll have to think about that.
> [1] https://lore.kernel.org/linux-devicetree/1561990354-4084-2-git-send-email-akinobu.mita@gmail.com/
>
>> diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
>> new file mode 100644
>> index 000000000000..af5eda326ec6
>> --- /dev/null
>> +++ b/drivers/nvme/host/nvme-hwmon.c
>> @@ -0,0 +1,163 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * NVM Express hardware monitoring support
>> + * Copyright (c) 2019, Guenter Roeck
>> + */
>> + >> +#include <linux/hwmon.h>
>> +
>> +#include "nvme.h"
>> +
>> +struct nvme_hwmon_data {
>> + struct nvme_ctrl *ctrl;
>> + struct nvme_smart_log log;
>> +};
>> +
>> +static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
>> +{
>> + return nvme_get_log(data->ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
>> + &data->log, sizeof(data->log), 0);
>> +}
>
> The 'data->log' is allocated per nvme_ctrl, so are there any locks to
> prevent multiple callers of nvme_hwmon_get_smart_log() from breaking
> the log buffer?
>
Good point. This needs either local memory like in your patch, or
I'll need a lock. I prefer a lock, though I am open to suggestions.
>> +
>> +static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
>> + u32 attr, int channel, long *val)
>> +{
>> + struct nvme_hwmon_data *data = dev_get_drvdata(dev);
>> + struct nvme_smart_log *log = &data->log;
>> + int err;
>> + int temp;
>> +
>> + err = nvme_hwmon_get_smart_log(data);
>> + if (err)
>> + return err < 0 ? err : -EPROTO;
>> +
>> + switch (attr) {
>> + case hwmon_temp_max:
>> + *val = (data->ctrl->wctemp - 273) * 1000;
>> + break;
>> + case hwmon_temp_crit:
>> + *val = (data->ctrl->cctemp - 273) * 1000;
>> + break;
>
> When this function is called with 'hwmon_temp_max' or 'hwmon_temp_crit',
> we don't need to call nvme_hwmon_get_smart_log() at all, do we?
>
Another good point.
Thanks,
Guenter
>> + case hwmon_temp_input:
>> + if (!channel)
>> + temp = le16_to_cpup((__le16 *)log->temperature);
>> + else
>> + temp = le16_to_cpu(log->temp_sensor[channel - 1]);
>> + *val = (temp - 273) * 1000;
>> + break;
>> + case hwmon_temp_crit_alarm:
>> + *val = !!(log->critical_warning & NVME_SMART_CRIT_TEMPERATURE);
>> + break;
>> + default:
>> + err = -EOPNOTSUPP;
>> + break;
>> + }
>> + return err;
>> +}
>
Powered by blists - more mailing lists