[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10e892a8-2b07-480e-93c1-3083ce31e7e2@ideasonboard.com>
Date: Tue, 29 Oct 2024 13:21:43 +0200
From: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Aradhya Bhatia <aradhya.bhatia@...ux.dev>,
"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
Devarsh Thakkar <devarsht@...com>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>, kernel-team@...roid.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] driver core: fw_devlink: Stop trying to optimize cycle
detection logic
Hi,
On 28/10/2024 22:39, Saravana Kannan wrote:
> On Mon, Oct 28, 2024 at 1:06 AM Tomi Valkeinen
> <tomi.valkeinen@...asonboard.com> wrote:
>>
>> Hi,
>>
>> On 26/10/2024 07:52, Saravana Kannan wrote:
>>> In attempting to optimize fw_devlink runtime, I introduced numerous cycle
>>> detection bugs by foregoing cycle detection logic under specific
>>> conditions. Each fix has further narrowed the conditions for optimization.
>>>
>>> It's time to give up on these optimization attempts and just run the cycle
>>> detection logic every time fw_devlink tries to create a device link.
>>>
>>> The specific bug report that triggered this fix involved a supplier fwnode
>>> that never gets a device created for it. Instead, the supplier fwnode is
>>> represented by the device that corresponds to an ancestor fwnode.
>>>
>>> In this case, fw_devlink didn't do any cycle detection because the cycle
>>> detection logic is only run when a device link is created between the
>>> devices that correspond to the actual consumer and supplier fwnodes.
>>>
>>> With this change, fw_devlink will run cycle detection logic even when
>>> creating SYNC_STATE_ONLY proxy device links from a device that is an
>>> ancestor of a consumer fwnode.
>>>
>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
>>> Closes: https://lore.kernel.org/all/1a1ab663-d068-40fb-8c94-f0715403d276@ideasonboard.com/
>>> Fixes: 6442d79d880c ("driver core: fw_devlink: Improve detection of overlapping cycles")
>>> Signed-off-by: Saravana Kannan <saravanak@...gle.com>
>>> ---
>>> Greg,
>>>
>>> I've tested this on my end and it looks ok and nothing fishy is going
>>> on. You can pick this up once Tomi gives a Tested-by.
>>
>> I tested this on TI AM62 SK board. It has an LVDS (OLDI) display and a
>> HDMI output, and both displays are connected to the same display
>> subsystem. I tested with OLDI single and dual link cases, with and
>> without HDMI, and in all cases probing works fine.
>>
>> Looks good on that front, so:
>>
>> Tested-by: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
>
> Great! Thanks!
>
>> You also asked for a diff of the devlinks. That part doesn't look so
>> good to me, but probably you can tell if it's normal or not.
>
> TL;DR: All device links in a cycle get marked as
> DL_FLAG_SYNC_STATE_ONLY and DL_FLAG_CYCLE (in addition to other
> flags). All DL_FLAG_SYNC_STATE_ONLY (not all of them are cycles) will
> get deleted after the consumer probes (since they are no longer needed
> after that). My guess on what's going on is that with the patch,
> fw_devlink found and marked more device links as cycles. Ones that in
> the past weren't detected as being part of a cycle but coincidentally
> the "post-init" dependency was the one that was getting ignored/not
> enforced. So the actual links in a cycle getting deleted after all the
> devices have probed is not a problem.
Ok. Yep, it might all be fine. I still don't understand all that's going
on here, so maybe look at one more case below.
> You can enable the "cycle" logs in drivers/base/core.c so it's easier
> to follow the cycles fw_devlink detected. But the logs are a bit
> cryptic because it tries to print all the multiple cycles that were
> detected using a recursive search.
>
> The non-cycle use for DL_FLAG_SYNC_STATE_ONLY is for parent devices to
> put a "proxy-vote" (Hey supplier, you still have a consumer that
> hasn't bound to a driver yet) for descendant (children, grand
> children) devices that haven't been created yet. So, without the fix
> it's possible some descendant child never got to probe and the
> DL_FLAG_SYNC_STATE_ONLY device link stuck around.
>
> If you can confirm all the deleted device links fall into one of these
> two categories, then there's no issue here. If you find cases that
> aren't clear, then let me know which one and point to specific nodes
> in an upstream DTS file and I can take a look.
>
> Every device link folder has a "sync_state_only" file that says if it
> has the DL_FLAG_SYNC_STATE_ONLY set. But to check for the cycle flag,
> you'll have to extend the debug log in device_link_add() that goes:
> "Linked as a sync state only consumer to......" and print the "flags" param.
I added this print.
I thought I'll test without any non-upstream patches, so this is booting
with the upstream k3-am625-sk.dtb, no overlays. I've attached boot log
(with this patch applied), and devlink lists, without and with this patch.
As the OLDI stuff is not upstream, I did expect to see less diff, and
that is the case. It's still somewhat interesting diff:
$ diff devlink-broken.txt devlink-fixed.txt
1d0
< i2c:1-0022--i2c:1-003b
So that's the gpio expander (exp1: gpio@22 in k3-am625-sk.dts) and the
hdmi bridge (sii9022: bridge-hdmi@3b in k3-am62x-sk-common.dtsi). And,
indeed, in the log I can see:
i2c 1-003b: Linked as a sync state only consumer to 1-0022 (flags 0x3c0)
/bus@...00/i2c@...10000/bridge-hdmi@3b Dropping the fwnode link to
/bus@...00/i2c@...10000/gpio@22
If I'm not mistaken, the above means that the framework has decided
there's a (possible) probe time cyclic dependency between the gpio
expander and the hdmi bridge, right?
I don't think that makes sense, and I was trying to understand why the
framework has arrived to such a conclusion, but it's not clear to me.
Also, I can see, e.g.:
/bus@...00/i2c@...10000: cycle: depends on /bus@...00/dss@...00000
So somehow the i2c bus has a dependency on the DSS? The DSS does not
depend on the i2c, but the HDMI does, so I can understand that the DSS
would have a dependency to i2c. But the other way around?
Tomi
>>
>> $ diff devlink-single-broken.txt devlink-single-fixed.txt
>
> I was hoping you'd give me some line count diff too to get a sense of
> if it's wreaking havoc or not. But based on my local testing on
> different hardware, I'm expecting a very small number of device links
> are getting affected.
>
>> 2d1
>> < i2c:1-0022--i2c:1-003b
>> 11d9
>> <
>> platform:44043000.system-controller:clock-controller--platform:20010000.i2c
>> 27d24
>> < platform:44043000.system-controller:clock-controller--platform:601000.gpio
>> 42d38
>> <
>> platform:44043000.system-controller:power-controller--platform:20010000.i2c
>> 58d53
>> < platform:44043000.system-controller:power-controller--platform:601000.gpio
>> 74d68
>> < platform:4d000000.mailbox--platform:44043000.system-controller
>
> I took a quick look at this one in
> arch/arm64/boot/dts/ti/k3-am62-main.dtsi which I assume is part of the
> device you are testing on and I couldn't find a cycle. But with dtsi
> and dts files, it's hard to find these manually. Let me know if
> fw_devlink is thinking there's a cycle where there is none.
>
>> 76d69
>> < platform:601000.gpio--i2c:1-0022
>> 80d72
>> < platform:bus@...00:interrupt-controller@...000--platform:601000.gpio
>> 82d73
>> < platform:f4000.pinctrl--i2c:1-0022
>> 84d74
>> < platform:f4000.pinctrl--platform:20010000.i2c
>>
>> "i2c:1-003b" is the hdmi bridge, "i2c:1-0022" is a gpio expander. So,
>> for example, we lose the devlink between the gpio expander and the hdmi
>> bridge. The expander is used for interrupts. There's an interrupt line
>> from the HDMI bridge to the expander, and from there there's an
>> interrupt line going to the SoC.
>>
>> Also, I noticed the devlinks change if I load the display drivers. The
>> above is before loading. Comparing the loaded/not-loaded:
>
> Yeah, DL_FLAG_SYNC_STATE_ONLY device links vanishing as more devices
> probe is not a problem. That's working as intended.
>
> Thanks,
> Saravana
>
>>
>> $ diff devlink-dual-fixed.txt devlink-dual-fixed-loaded.txt
>> 3d2
>> < i2c:1-003b--platform:30200000.dss
>> 23d21
>> <
>> platform:44043000.system-controller:clock-controller--platform:30200000.dss
>> 52d49
>> <
>> platform:44043000.system-controller:power-controller--platform:30200000.dss
>> 73d69
>> < platform:display--platform:30200000.dss
>> 78d73
>> < platform:f4000.pinctrl--platform:30200000.dss
>> 97a93
>> > regulator:regulator.0--platform:display
>>
>> Tomi
>>
>>
>>> Thanks,
>>> Saravana
>>>
>>> v1 -> v2:
>>> - Removed the RFC tag
>>> - Remaned the subject. v1 is https://lore.kernel.org/all/20241025223721.184998-1-saravanak@google.com/T/#u
>>> - Added a NULL check to avoid NULL pointer deref
>>>
>>> drivers/base/core.c | 46 ++++++++++++++++++++-------------------------
>>> 1 file changed, 20 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>> index 3b13fed1c3e3..f96f2e4c76b4 100644
>>> --- a/drivers/base/core.c
>>> +++ b/drivers/base/core.c
>>> @@ -1990,10 +1990,10 @@ static struct device *fwnode_get_next_parent_dev(const struct fwnode_handle *fwn
>>> *
>>> * Return true if one or more cycles were found. Otherwise, return false.
>>> */
>>> -static bool __fw_devlink_relax_cycles(struct device *con,
>>> +static bool __fw_devlink_relax_cycles(struct fwnode_handle *con_handle,
>>> struct fwnode_handle *sup_handle)
>>> {
>>> - struct device *sup_dev = NULL, *par_dev = NULL;
>>> + struct device *sup_dev = NULL, *par_dev = NULL, *con_dev = NULL;
>>> struct fwnode_link *link;
>>> struct device_link *dev_link;
>>> bool ret = false;
>>> @@ -2010,22 +2010,22 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>>
>>> sup_handle->flags |= FWNODE_FLAG_VISITED;
>>>
>>> - sup_dev = get_dev_from_fwnode(sup_handle);
>>> -
>>> /* Termination condition. */
>>> - if (sup_dev == con) {
>>> + if (sup_handle == con_handle) {
>>> pr_debug("----- cycle: start -----\n");
>>> ret = true;
>>> goto out;
>>> }
>>>
>>> + sup_dev = get_dev_from_fwnode(sup_handle);
>>> + con_dev = get_dev_from_fwnode(con_handle);
>>> /*
>>> * If sup_dev is bound to a driver and @con hasn't started binding to a
>>> * driver, sup_dev can't be a consumer of @con. So, no need to check
>>> * further.
>>> */
>>> if (sup_dev && sup_dev->links.status == DL_DEV_DRIVER_BOUND &&
>>> - con->links.status == DL_DEV_NO_DRIVER) {
>>> + con_dev && con_dev->links.status == DL_DEV_NO_DRIVER) {
>>> ret = false;
>>> goto out;
>>> }
>>> @@ -2034,7 +2034,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>> if (link->flags & FWLINK_FLAG_IGNORE)
>>> continue;
>>>
>>> - if (__fw_devlink_relax_cycles(con, link->supplier)) {
>>> + if (__fw_devlink_relax_cycles(con_handle, link->supplier)) {
>>> __fwnode_link_cycle(link);
>>> ret = true;
>>> }
>>> @@ -2049,7 +2049,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>> else
>>> par_dev = fwnode_get_next_parent_dev(sup_handle);
>>>
>>> - if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode)) {
>>> + if (par_dev && __fw_devlink_relax_cycles(con_handle, par_dev->fwnode)) {
>>> pr_debug("%pfwf: cycle: child of %pfwf\n", sup_handle,
>>> par_dev->fwnode);
>>> ret = true;
>>> @@ -2067,7 +2067,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>> !(dev_link->flags & DL_FLAG_CYCLE))
>>> continue;
>>>
>>> - if (__fw_devlink_relax_cycles(con,
>>> + if (__fw_devlink_relax_cycles(con_handle,
>>> dev_link->supplier->fwnode)) {
>>> pr_debug("%pfwf: cycle: depends on %pfwf\n", sup_handle,
>>> dev_link->supplier->fwnode);
>>> @@ -2140,25 +2140,19 @@ static int fw_devlink_create_devlink(struct device *con,
>>> return -EINVAL;
>>>
>>> /*
>>> - * SYNC_STATE_ONLY device links don't block probing and supports cycles.
>>> - * So, one might expect that cycle detection isn't necessary for them.
>>> - * However, if the device link was marked as SYNC_STATE_ONLY because
>>> - * it's part of a cycle, then we still need to do cycle detection. This
>>> - * is because the consumer and supplier might be part of multiple cycles
>>> - * and we need to detect all those cycles.
>>> + * Don't try to optimize by not calling the cycle detection logic under
>>> + * certain conditions. There's always some corner case that won't get
>>> + * detected.
>>> */
>>> - if (!device_link_flag_is_sync_state_only(flags) ||
>>> - flags & DL_FLAG_CYCLE) {
>>> - device_links_write_lock();
>>> - if (__fw_devlink_relax_cycles(con, sup_handle)) {
>>> - __fwnode_link_cycle(link);
>>> - flags = fw_devlink_get_flags(link->flags);
>>> - pr_debug("----- cycle: end -----\n");
>>> - dev_info(con, "Fixed dependency cycle(s) with %pfwf\n",
>>> - sup_handle);
>>> - }
>>> - device_links_write_unlock();
>>> + device_links_write_lock();
>>> + if (__fw_devlink_relax_cycles(link->consumer, sup_handle)) {
>>> + __fwnode_link_cycle(link);
>>> + flags = fw_devlink_get_flags(link->flags);
>>> + pr_debug("----- cycle: end -----\n");
>>> + pr_info("%pfwf: Fixed dependency cycle(s) with %pfwf\n",
>>> + link->consumer, sup_handle);
>>> }
>>> + device_links_write_unlock();
>>>
>>> if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
>>> sup_dev = fwnode_get_next_parent_dev(sup_handle);
>>
View attachment "boot-fixed.txt" of type "text/plain" (116630 bytes)
View attachment "devlink-broken.txt" of type "text/plain" (6862 bytes)
View attachment "devlink-fixed.txt" of type "text/plain" (6839 bytes)
Powered by blists - more mailing lists