[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPdf9lyY9ysq2mPx@pluto>
Date: Tue, 21 Oct 2025 11:27:02 +0100
From: Cristian Marussi <cristian.marussi@....com>
To: Jonathan Cameron <jonathan.cameron@...wei.com>
Cc: Cristian Marussi <cristian.marussi@....com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
arm-scmi@...r.kernel.org, sudeep.holla@....com,
james.quinlan@...adcom.com, f.fainelli@...il.com,
vincent.guittot@...aro.org, etienne.carriere@...com,
peng.fan@....nxp.com, michal.simek@....com, quic_sibis@...cinc.com,
dan.carpenter@...aro.org, d-gole@...com, souvik.chakravarty@....com
Subject: Re: [PATCH 06/10] firmware: arm_scmi: Add System Telemetry driver
On Mon, Oct 20, 2025 at 05:23:28PM +0100, Jonathan Cameron wrote:
> On Thu, 25 Sep 2025 21:35:50 +0100
> Cristian Marussi <cristian.marussi@....com> wrote:
>
> > Add a new SCMI System Telemetry driver which gathers platform Telemetry
> > data through the new the SCMI Telemetry protocol and expose all of the
> > discovered Telemetry data events on a dedicated pseudo-filesystem that
> > can be used to interactively configure SCMI Telemetry and access its
> > provided data.
>
Hi,
> I'm not a fan of providing yet another filesystem but you didn't
> lay out reasoning in the cover letter.
Sorry, I dont understand..you mean here that I did NOT provide enough reasons
why I am adopting a new FS approach ? ... or I misunderstood the English ?
.. because I did provide a lot of reasons (for my point-of-view) to go
for a new FS in the cover-letter...
>
> One non trivial issue is that you'll have to get filesystem review on this.
> My review is rather superficial but a few things stood out.
Well yes I would have expected that, but now the FS implementation
internals of this series is definetely immature and to be reworked (to
the extent of using a well-know deprecated FS mount api at first..)
So I posted this V1 to lay-out the ideas and the effective FS API layout
but I was planning to extend the review audience once I have reworked fully
the series FS bits in the next V2...
>
> >
> > Signed-off-by: Cristian Marussi <cristian.marussi@....com>
>
> > diff --git a/drivers/firmware/arm_scmi/scmi_system_telemetry.c b/drivers/firmware/arm_scmi/scmi_system_telemetry.c
> > new file mode 100644
> > index 000000000000..2fec465b0f33
> > --- /dev/null
> > +++ b/drivers/firmware/arm_scmi/scmi_system_telemetry.c
>
>
> > +static ssize_t
> > +scmi_tlm_update_interval_write(struct file *filp, const char __user *buf,
> > + size_t count, loff_t *ppos)
> > +{
> > + struct scmi_tlm_inode *tlmi = to_tlm_inode(file_inode(filp));
> > + const struct scmi_tlm_setup *tsp = tlmi->tsp;
> > + bool is_group = IS_GROUP(tlmi->cls->flags);
> > + unsigned int update_interval_ms = 0, secs = 0;
> > + int ret, grp_id, exp = -3;
> > + char *kbuf, *p, *token;
> > +
> > + kbuf = memdup_user_nul(buf, count);
>
> I'd use a __free(kfree) as then you can directly return in error paths.
> Will keep the buffer around a little longer than strictly necessary but
> I'm not seeing where that will cause problems.
>
Ok.
> > + if (IS_ERR(kbuf))
> > + return PTR_ERR(kbuf);
> > +
> > + p = kbuf;
> > + token = strsep(&p, " ");
> > + if (!token) {
> > + /* At least one token must exist to be a valid input */
> > + ret = -EINVAL;
> > + goto err;
> > + }
> > +
> > + ret = kstrtouint(token, 0, &secs);
> > + if (ret)
> > + goto err;
> > +
> > + token = strsep(&p, " ");
> > + if (token) {
> > + ret = kstrtoint(token, 0, &exp);
> > + if (ret)
> > + goto err;
> > + }
> > +
> > + kfree(kbuf);
> > +
> > + update_interval_ms = SCMI_TLM_BUILD_UPDATE_INTERVAL(secs, exp);
> > +
> > + grp_id = !is_group ? SCMI_TLM_GRP_INVALID : tlmi->grp->info->id;
> > + ret = tsp->ops->collection_configure(tsp->ph, grp_id, !is_group, NULL,
> > + &update_interval_ms, NULL);
> > + if (ret)
> > + return ret;
> > +
> > + return count;
> > +
> > +err:
> > + kfree(kbuf);
> > + return ret;
> > +}
>
>
> > +
> > +static int scmi_tlm_bulk_buffer_allocate_and_fill(struct scmi_tlm_inode *tlmi,
> > + struct scmi_tlm_priv *tp)
> > +{
> > + const struct scmi_tlm_setup *tsp = tlmi->tsp;
> > + const struct scmi_tlm_class *cls = tlmi->cls;
> > + struct scmi_telemetry_de_sample *samples;
> > + bool is_group = IS_GROUP(cls->flags);
> > + int ret, num_samples, res_id;
> > +
> > + num_samples = !is_group ? tlmi->info->base.num_des :
> > + tlmi->grp->info->num_des;
> > + tp->buf_sz = num_samples * MAX_BULK_LINE_CHAR_LENGTH;
> > + tp->buf = kzalloc(tp->buf_sz, GFP_KERNEL);
> > + if (!tp->buf)
> > + return -ENOMEM;
> > +
> > + res_id = is_group ? tlmi->grp->info->id : SCMI_TLM_GRP_INVALID;
> > + samples = kcalloc(num_samples, sizeof(*samples), GFP_KERNEL);
> > + if (!samples) {
> > + kfree(tp->buf);
> > + return -ENOMEM;
> > + }
> > +
> > + ret = tp->bulk_retrieve(tsp, res_id, &num_samples, samples);
> > + if (ret) {
> > + kfree(samples);
> Free them in reverse of allocation. Makes it easier to review.
>
Ok.
> > + kfree(tp->buf);
> > + return ret;
> > + }
> > +
> > + ret = scmi_tlm_buffer_fill(tsp->dev, tp->buf, tp->buf_sz, &tp->buf_len,
> > + num_samples, samples);
> I'm a little surprised by lifetime of tp->buf if this return an error.
> Perhaps add a comment on that.
Ok.
>
> > + kfree(samples);
> > +
> > + return ret;
> > +}
>
>
> > +
> > +static struct scmi_tlm_instance *scmi_tlm_init(struct scmi_tlm_setup *tsp,
> > + int instance_id)
> > +{
> > + struct device *dev = tsp->dev;
> > + struct scmi_tlm_instance *ti;
> > + int ret;
> > +
> > + ti = devm_kzalloc(dev, sizeof(*ti), GFP_KERNEL);
> Given use of devm I'm guessing this will only be called from probe().
> With that in mind...
Yes...
> > + if (!ti)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + ti->info = tsp->ops->info_get(tsp->ph);
> > + if (!ti->info) {
> > + dev_err(dev, "invalid Telemetry info !\n");
> > + return ERR_PTR(-EINVAL);
>
> return dev_err_probe()
>
Indeed.
> > + }
> > +
> > + ti->id = instance_id;
> > + ti->tsp = tsp;
> > +
> > + ret = scmi_tlm_root_instance_initialize(dev, ti);
> > + if (ret)
> > + return ERR_PTR(ret);
> > +
> > + ret = scmi_telemetry_des_initialize(dev, ti);
> > + if (ret)
> > + return ERR_PTR(ret);
> > +
> > + ret = scmi_telemetry_groups_initialize(dev, ti);
> > + if (ret)
> > + return ERR_PTR(ret);
> > +
> > + return ti;
> > +}
> > +
> > +static int scmi_telemetry_probe(struct scmi_device *sdev)
> > +{
> > + const struct scmi_handle *handle = sdev->handle;
> > + struct scmi_protocol_handle *ph;
> > + struct device *dev = &sdev->dev;
> > + struct scmi_tlm_instance *ti;
> > + struct scmi_tlm_setup *tsp;
> > + const void *ops;
> > +
> > + if (!handle)
> > + return -ENODEV;
> > +
> > + ops = handle->devm_protocol_get(sdev, sdev->protocol_id, &ph);
> > + if (IS_ERR(ops))
> > + return dev_err_probe(dev, PTR_ERR(ops),
> > + "Cannot access protocol:0x%X\n",
> > + sdev->protocol_id);
> > +
> > + tsp = devm_kzalloc(dev, sizeof(*tsp), GFP_KERNEL);
> > + if (!tsp)
> > + return -ENOMEM;
> > +
> > + tsp->dev = dev;
> > + tsp->ops = ops;
> > + tsp->ph = ph;
> > +
> > + ti = scmi_tlm_init(tsp, atomic_fetch_inc(&scmi_tlm_instance_count));
> > + if (IS_ERR(ti))
> > + return PTR_ERR(ti);
> > +
> > + mutex_lock(&scmi_tlm_mtx);
> > + list_add(&ti->node, &scmi_telemetry_instances);
> > + if (scmi_tlm_sb) {
> > + int ret;
> > +
> > + /*
> > + * If the file system was already mounted by the time this
> > + * instance was probed, register explicitly, since the list
> > + * has been scanned already.
> > + */
> > + mutex_unlock(&scmi_tlm_mtx);
> > + ret = scmi_telemetry_instance_register(scmi_tlm_sb, ti);
> > + if (ret)
> > + return ret;
> > + mutex_lock(&scmi_tlm_mtx);
> I guess this will make sense in later patches. Right now it looks like you should
> just check scmi_tlb_sb after releasing the lock.
> E.g.
> mutex_lock(&scmi_tlm_mtx); //I'd spell out mutex
> list_add(&ti->ode, &scam_telemetry_instances);
> mutex_unlock(&scmi_tlm_mtx);
> if (scmi_tlm_sb) {
> ret = scmi....
>
> > + }
> If you really have to check if (scmi_tlb_sb) under the lock just take a copy into a local
> variable and use that after releasing the lock.
>
Yes a lot to review/rework here...
> > + mutex_unlock(&scmi_tlm_mtx);
> > +
> > + dev_set_drvdata(&sdev->dev, ti);
> > +
> > + return 0;
> > +}
>
> > +static const struct scmi_device_id scmi_id_table[] = {
> > + { SCMI_PROTOCOL_TELEMETRY, "telemetry" },
> > + { },
>
> Drop that trailing comma. Only thing it does is make
> it easy to introduce a bug by putting something after it.
>
I'll do.
> > +};
> > +MODULE_DEVICE_TABLE(scmi, scmi_id_table);
>
>
> > +
> > +static int __init scmi_telemetry_init(void)
> > +{
> > + int ret;
> > +
> > + ret = scmi_register(&scmi_telemetry_driver);
> Why do this first? My immediate assumption is this allows
> for drivers to register in parallel with the rest of init
> happening. Feels like it should be the last thin in init.
>
> > + if (ret)
> > + return ret;
> > +
> > + ret = sysfs_create_mount_point(fs_kobj, TLM_FS_MNT);
> > + if (ret && ret != -EEXIST) {
> > + scmi_unregister(&scmi_telemetry_driver);
>
> Given the classic pattern of building up more things to undo.
> Use gotos and an error handling block as that's what we normally
> expect to see in the kernel.
Ok.
>
> > + return ret;
> > + }
> > +
> > + ret = register_filesystem(&scmi_telemetry_fs);
> > + if (ret) {
> > + sysfs_remove_mount_point(fs_kobj, TLM_FS_MNT);
> > + scmi_unregister(&scmi_telemetry_driver);
> > + }
> > +
> > + return ret;
> > +}
> > +module_init(scmi_telemetry_init);
> > +
> > +static void __exit scmi_telemetry_exit(void)
> > +{
> > + int ret;
> > +
> > + ret = unregister_filesystem(&scmi_telemetry_fs);
>
> Documentation says this only fails if the filesystem isn't found.
> How can that happen if the init above succeeded?
>
> I noted from a quick scan that most filesystems don't check the
> return value. The only one that does uses it to print a message
> and otherwise continues as if it succeeded.
Ok.
>
>
>
> > + if (!ret)
> > + sysfs_remove_mount_point(fs_kobj, TLM_FS_MNT);
> > + else
> > + pr_err("Failed to unregister %s\n", TLM_FS_NAME);
> Given 1 out of 100s of file systems bothers to do this. I'm suspecting
> it's a thing that won't happen.
>
> I'd just call the sysfs_remove_mount_point() unconditionally.
>
Ok.
I wil rework heavily in V2 the FS layer and post also to the related FS
mailing lists...
Thanks,
Cristian
Powered by blists - more mailing lists