lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx8m48cy-EzP6_uoGN7KWsQw=CfZWQ-hNUzz_7LZ0voG8A@mail.gmail.com>
Date: Tue, 13 Aug 2024 23:51:24 -0700
From: Saravana Kannan <saravanak@...gle.com>
To: Peng Fan <peng.fan@....com>
Cc: Cristian Marussi <cristian.marussi@....com>, Dhruva Gole <d-gole@...com>, 
	"Peng Fan (OSS)" <peng.fan@....nxp.com>, "sudeep.holla@....com" <sudeep.holla@....com>, 
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"arm-scmi@...r.kernel.org" <arm-scmi@...r.kernel.org>, Android Kernel Team <kernel-team@...roid.com>
Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode for scmi cpufreq

On Tue, Aug 13, 2024 at 8:36 PM Peng Fan <peng.fan@....com> wrote:
>
> > Subject: RE: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode
> > for scmi cpufreq
> >
> > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode
> > for
> > > scmi cpufreq
> > >
> > > On Tue, Aug 13, 2024 at 01:52:30PM +0000, Peng Fan wrote:
> > > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set
> > > fwnode
> > > > > for scmi cpufreq
> > > > >
> > > > > On Tue, Aug 13, 2024 at 10:25:31AM +0000, Peng Fan wrote:
> > > > > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set
> > > > > fwnode
> > > > > > > for scmi cpufreq
> > > > > > >
> > > > > > > On Jul 29, 2024 at 15:03:25 +0800, Peng Fan (OSS) wrote:
> > > > > > > > From: Peng Fan <peng.fan@....com>
> > > > > > > >
> > > > > > > > Two drivers scmi_cpufreq.c and scmi_perf_domain.c both
> > use
> > > > > > > > SCMI_PROTCOL_PERF protocol, but with different name, so
> > > two
> > > > > scmi
> > > > > > > > devices will be created. But the fwnode->dev could only
> > > > > > > > point to one
> > > > > > > device.
> > > > > > > >
> > > > > > > > If scmi cpufreq device created earlier, the fwnode->dev will
> > > > > > > > point to the scmi cpufreq device. Then the fw_devlink will
> > > > > > > > link performance domain user device(consumer) to the scmi
> > > > > > > > cpufreq
> > > > > device(supplier).
> > > > > > > > But actually the performance domain user device, such as
> > > > > > > > GPU,
> > > > > > > should
> > > > > > > > use the scmi perf device as supplier. Also if 'cpufreq.off=1'
> > > > > > > > in bootargs, the GPU driver will defer probe always, because
> > > > > > > > of the scmi cpufreq
> > > > > > >
> > > > > > > The commit message itself seems very specific to some
> > platform
> > > > > > > to
> > > > > me.
> > > > > > > What about platforms that don't atall have a GPU? Why
> > would
> > > > > they
> > > > > > > care about this?
> > > > > >
> > > > > > It is a generic issue if a platform has performance domain to
> > > > > > serve scmi cpufreq and device performance level.
> > > > > >
> > > > > > >
> > > > > > > > device not ready.
> > > > > > > >
> > > > > > > > Because for cpufreq, no need use fw_devlink. So bypass
> > > setting
> > > > > > > fwnode
> > > > > > > > for scmi cpufreq device.
> > > > > > > >
> > > > > > > > Fixes: 96da4a99ce50 ("firmware: arm_scmi: Set fwnode for
> > > the
> > > > > > > > scmi_device")
> > > > > > > > Signed-off-by: Peng Fan <peng.fan@....com>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > V2:
> > > > > > > >  Use A!=B to replace !(A == B)  Add fixes tag  This might be
> > > > > > > > a workaround, but since this is a fix, it is simple for
> > > > > > > > backporting.
> > > > > > >
> > > > > > > More than a workaround, it feels like a HACK to me.
> > > > > > >
> > > > > > > >
> > > > > > > > V1:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >  drivers/firmware/arm_scmi/bus.c | 3 ++-
> > > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/firmware/arm_scmi/bus.c
> > > > > > > > b/drivers/firmware/arm_scmi/bus.c index
> > > > > > > 96b2e5f9a8ef..be91a82e0cda
> > > > > > > > 100644
> > > > > > > > --- a/drivers/firmware/arm_scmi/bus.c
> > > > > > > > +++ b/drivers/firmware/arm_scmi/bus.c
> > > > > > > > @@ -395,7 +395,8 @@ __scmi_device_create(struct
> > > > > device_node
> > > > > > > *np, struct device *parent,
> > > > > > > >   scmi_dev->id = id;
> > > > > > > >   scmi_dev->protocol_id = protocol;
> > > > > > > >   scmi_dev->dev.parent = parent;
> > > > > > > > - device_set_node(&scmi_dev->dev,
> > > of_fwnode_handle(np));
> > > > > > > > + if ((protocol != SCMI_PROTOCOL_PERF) ||
> > > strcmp(name,
> > > > > > > "cpufreq"))
> > > > > > > > +         device_set_node(&scmi_dev->dev,
> > > > > > > of_fwnode_handle(np));
> > > > > > >
> > > > > > > I kind of disagree with the idea here to be specific about the
> > > > > > > PROTOCOL_PERF or cpufreq. This is a generic arm scmi bus
> > > driver
> > > > > right?
> > > > > > > Why bring in specific code into a bus driver? We will never
> > > > > > > fix the actual root cause of the issue this way.
> > > > > >
> > > > > > The root cause is fwnode devlink only supports one fwnode, one
> > > > > device.
> > > > > > 1:1 match. But current arm scmi driver use one fwnode for two
> > > > > devices.
> > > > > >
> > > > > > If your platform has scmi cpufreq and scmi device performance
> > > > > domain,
> > > > > > you might see that some devices are consumer of scmi cpufreq,
> > > but
> > > > > > actually they should be consumer of scmi device performance
> > > > > domain.
> > > > > >
> > > > > > I not have a good idea that this is fw devlink design that only
> > > > > > allows
> > > > > > 1 fwnode has 1 device or not. If yes, that arm scmi should be
> > fixed.
> > > > > > If not, fw devlink should be updated.
> > > > > >
> > > > > > The current patch is the simplest method for stable tree fixes
> > > > > > as I could work out.
> > > > >
> > > > > So this is the same root cause at the end of the issues you had
> > > > > with IMX pinctrl coexistence...i.e. the SCMI stack creates scmi
> > > > > devices that embeds the protocol node, BUT since you can have
> > > > > multiple device/drivers doing different things on different
> > > > > resources within the same protocol you can end with 2 devices
> > > > > having the same embedded device_node, since we dont really
> > have
> > > > > anything else to
> > > use
> > > > > as device_node, I rigth ?
> > > >
> > > > I think, yes. And you remind me that with PINCTRL_SCMI and
> > > > CONFIG_PINCTRL_IMX_SCMI both enabled, the scmi pinctrl node
> > will
> > > only
> > > > take one to set the fwnode device pointer depends on the order to
> > > run
> > > > __scmi_device_create.
> > > >
> > > > So not only perf, pinctrl also has issue here, fwnode devlink will
> > > > not work properly for pinctrl/perf.
> > >
> > > ...mmm ... the standard generic Pinctrl driver and the IMX Pintrcl are
> > > mutually exclusive in the code (@probe time) itself as far as I can
> > > remember about what you did, so why devlink should have that issue
> > > there ?
> > > Have you seen any issue in this regards while having loaded
> > > pinctrl_scmi and pinctrl_imx_scmi ?
> >
> > No. it works well in my setup. I am just worried that there might be
> > issues because fwnode only has one dev pointer, see device_add.
> >
> > >
> > > I want to have a better look next days about this devlink issue that
> > > you reported...it still not clear to me...from device_link_add() docs,
> > > it seems indeed that it will return the old existing link if a link
> > > exist already between that same supplier/consumer devices
> > pair....but
> > > from the code (at first sight) it seems that the check is made agains
> > > the devices not their embeded device_nodes, but here (and in
> > > pinctrl/imx case) you will have 2 distinct consumer devices (with
> > same
> > > device_node)...I may have missed something in your exaplanation....
> >
> > It might be false alarm for pinctrl, but it is true issue for perf.
>
> Just give a recheck. Pinctrl has same issue. With PINCTRL_SCMI
> and PINCTRL_IMX_SCMI both enabled, two scmi devices
> will be created. So it depends on device creation order, 1st
> created device will be used as supplier.
>
> On i.MX, there is no issue with both enabled, it is because
> the imx scmi device is created first. If the generic pinctrl scmi
> device created first, imx will have issue.
>
> In the end, this is generic fw_devlink limitation or arm scmi driver
> issue.

+1 to what Cristian and Dhruva said in other parts of this thread.

This patch itself is definitely a hack.

The problem isn't so much that fw_devlink doesn't want to support
multiple devices getting instantiated from one DT node. The problem is
that there's no way to know which of the multiple devices is the real
supplier just by looking at the information in devicetree/firmware
(the fw in fw_devlink). And keep in mind that one of the main
requirements of fw_devlink is to work before any driver is loaded and
not depend on drivers for correctness of the dependency information
because it needs to work on a fully modular kernel too. So, fw_devlink
just picks the first device that's instantiated from a DT node.

I really hate folks creating multiple devices from one DT node. One IP
block can support multiple things, there's no need to instantiate
multiple devices for it. The same driver could have just as easily
registered with multiple frameworks. So, ideally I'd want us to fix
this issue in the SCMI framework code. In the case where the same SCMI
node is creating two devices, can they both probe successfully? If
yes, why are we not using a child node or a separate node for this
second device? If it's always one or the other, why are we creating
two devices? Can you please point to specific upstream DT examples for
me to get a better handle on what's going on?

Btw, there is the deferred_probe_timeout command line option that can
be used so that fw_devlink stops enforcing dependencies where there
are no supplier drivers for a device after a timeout. It's not ideal,
but it's something to unblock you.

The best fw_devlink could do is just not enforce any dependencies if
there is more than one device instantiated for a given supplier DT
node.

-Saravana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ