lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10e892a8-2b07-480e-93c1-3083ce31e7e2@ideasonboard.com>
Date: Tue, 29 Oct 2024 13:21:43 +0200
From: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Aradhya Bhatia <aradhya.bhatia@...ux.dev>,
 "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
 Devarsh Thakkar <devarsht@...com>,
 Dmitry Baryshkov <dmitry.baryshkov@...aro.org>, kernel-team@...roid.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] driver core: fw_devlink: Stop trying to optimize cycle
 detection logic

Hi,

On 28/10/2024 22:39, Saravana Kannan wrote:
> On Mon, Oct 28, 2024 at 1:06 AM Tomi Valkeinen
> <tomi.valkeinen@...asonboard.com> wrote:
>>
>> Hi,
>>
>> On 26/10/2024 07:52, Saravana Kannan wrote:
>>> In attempting to optimize fw_devlink runtime, I introduced numerous cycle
>>> detection bugs by foregoing cycle detection logic under specific
>>> conditions. Each fix has further narrowed the conditions for optimization.
>>>
>>> It's time to give up on these optimization attempts and just run the cycle
>>> detection logic every time fw_devlink tries to create a device link.
>>>
>>> The specific bug report that triggered this fix involved a supplier fwnode
>>> that never gets a device created for it. Instead, the supplier fwnode is
>>> represented by the device that corresponds to an ancestor fwnode.
>>>
>>> In this case, fw_devlink didn't do any cycle detection because the cycle
>>> detection logic is only run when a device link is created between the
>>> devices that correspond to the actual consumer and supplier fwnodes.
>>>
>>> With this change, fw_devlink will run cycle detection logic even when
>>> creating SYNC_STATE_ONLY proxy device links from a device that is an
>>> ancestor of a consumer fwnode.
>>>
>>> Reported-by: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
>>> Closes: https://lore.kernel.org/all/1a1ab663-d068-40fb-8c94-f0715403d276@ideasonboard.com/
>>> Fixes: 6442d79d880c ("driver core: fw_devlink: Improve detection of overlapping cycles")
>>> Signed-off-by: Saravana Kannan <saravanak@...gle.com>
>>> ---
>>> Greg,
>>>
>>> I've tested this on my end and it looks ok and nothing fishy is going
>>> on. You can pick this up once Tomi gives a Tested-by.
>>
>> I tested this on TI AM62 SK board. It has an LVDS (OLDI) display and a
>> HDMI output, and both displays are connected to the same display
>> subsystem. I tested with OLDI single and dual link cases, with and
>> without HDMI, and in all cases probing works fine.
>>
>> Looks good on that front, so:
>>
>> Tested-by: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
> 
> Great! Thanks!
> 
>> You also asked for a diff of the devlinks. That part doesn't look so
>> good to me, but probably you can tell if it's normal or not.
> 
> TL;DR: All device links in a cycle get marked as
> DL_FLAG_SYNC_STATE_ONLY and DL_FLAG_CYCLE (in addition to other
> flags). All DL_FLAG_SYNC_STATE_ONLY (not all of them are cycles) will
> get deleted after the consumer probes (since they are no longer needed
> after that). My guess on what's going on is that with the patch,
> fw_devlink found and marked more device links as cycles. Ones that in
> the past weren't detected as being part of a cycle but coincidentally
> the "post-init" dependency was the one that was getting ignored/not
> enforced. So the actual links in a cycle getting deleted after all the
> devices have probed is not a problem.

Ok. Yep, it might all be fine. I still don't understand all that's going 
on here, so maybe look at one more case below.

> You can enable the "cycle" logs in drivers/base/core.c so it's easier
> to follow the cycles fw_devlink detected. But the logs are a bit
> cryptic because it tries to print all the multiple cycles that were
> detected using a recursive search.
> 
> The non-cycle use for DL_FLAG_SYNC_STATE_ONLY is for parent devices to
> put a "proxy-vote" (Hey supplier, you still have a consumer that
> hasn't bound to a driver yet) for descendant (children, grand
> children) devices that haven't been created yet. So, without the fix
> it's possible some descendant child never got to probe and the
> DL_FLAG_SYNC_STATE_ONLY device link stuck around.
> 
> If you can confirm all the deleted device links fall into one of these
> two categories, then there's no issue here. If you find cases that
> aren't clear, then let me know which one and point to specific nodes
> in an upstream DTS file and I can take a look.
> 
> Every device link folder has a "sync_state_only" file that says if it
> has the DL_FLAG_SYNC_STATE_ONLY set. But to check for the cycle flag,
> you'll have to extend the debug log in device_link_add() that goes:
> "Linked as a sync state only consumer to......" and print the "flags" param.

I added this print.

I thought I'll test without any non-upstream patches, so this is booting 
with the upstream k3-am625-sk.dtb, no overlays. I've attached boot log 
(with this patch applied), and devlink lists, without and with this patch.

As the OLDI stuff is not upstream, I did expect to see less diff, and 
that is the case. It's still somewhat interesting diff:

$ diff devlink-broken.txt devlink-fixed.txt
1d0
< i2c:1-0022--i2c:1-003b

So that's the gpio expander (exp1: gpio@22 in k3-am625-sk.dts) and the 
hdmi bridge (sii9022: bridge-hdmi@3b in k3-am62x-sk-common.dtsi). And, 
indeed, in the log I can see:

i2c 1-003b: Linked as a sync state only consumer to 1-0022 (flags 0x3c0)
/bus@...00/i2c@...10000/bridge-hdmi@3b Dropping the fwnode link to 
/bus@...00/i2c@...10000/gpio@22

If I'm not mistaken, the above means that the framework has decided 
there's a (possible) probe time cyclic dependency between the gpio 
expander and the hdmi bridge, right?

I don't think that makes sense, and I was trying to understand why the 
framework has arrived to such a conclusion, but it's not clear to me.

Also, I can see, e.g.:

/bus@...00/i2c@...10000: cycle: depends on /bus@...00/dss@...00000

So somehow the i2c bus has a dependency on the DSS? The DSS does not 
depend on the i2c, but the HDMI does, so I can understand that the DSS 
would have a dependency to i2c. But the other way around?

  Tomi

>>
>> $ diff devlink-single-broken.txt devlink-single-fixed.txt
> 
> I was hoping you'd give me some line count diff too to get a sense of
> if it's wreaking havoc or not. But based on my local testing on
> different hardware, I'm expecting a very small number of device links
> are getting affected.
> 
>> 2d1
>> < i2c:1-0022--i2c:1-003b
>> 11d9
>> <
>> platform:44043000.system-controller:clock-controller--platform:20010000.i2c
>> 27d24
>> < platform:44043000.system-controller:clock-controller--platform:601000.gpio
>> 42d38
>> <
>> platform:44043000.system-controller:power-controller--platform:20010000.i2c
>> 58d53
>> < platform:44043000.system-controller:power-controller--platform:601000.gpio
>> 74d68
>> < platform:4d000000.mailbox--platform:44043000.system-controller
> 
> I took a quick look at this one in
> arch/arm64/boot/dts/ti/k3-am62-main.dtsi which I assume is part of the
> device you are testing on and I couldn't find a cycle. But with dtsi
> and dts files, it's hard to find these manually. Let me know if
> fw_devlink is thinking there's a cycle where there is none.
> 
>> 76d69
>> < platform:601000.gpio--i2c:1-0022
>> 80d72
>> < platform:bus@...00:interrupt-controller@...000--platform:601000.gpio
>> 82d73
>> < platform:f4000.pinctrl--i2c:1-0022
>> 84d74
>> < platform:f4000.pinctrl--platform:20010000.i2c
>>
>> "i2c:1-003b" is the hdmi bridge, "i2c:1-0022" is a gpio expander. So,
>> for example, we lose the devlink between the gpio expander and the hdmi
>> bridge. The expander is used for interrupts. There's an interrupt line
>> from the HDMI bridge to the expander, and from there there's an
>> interrupt line going to the SoC.
>>
>> Also, I noticed the devlinks change if I load the display drivers. The
>> above is before loading. Comparing the loaded/not-loaded:
> 
> Yeah, DL_FLAG_SYNC_STATE_ONLY device links vanishing as more devices
> probe is not a problem. That's working as intended.
> 
> Thanks,
> Saravana
> 
>>
>> $ diff devlink-dual-fixed.txt devlink-dual-fixed-loaded.txt
>> 3d2
>> < i2c:1-003b--platform:30200000.dss
>> 23d21
>> <
>> platform:44043000.system-controller:clock-controller--platform:30200000.dss
>> 52d49
>> <
>> platform:44043000.system-controller:power-controller--platform:30200000.dss
>> 73d69
>> < platform:display--platform:30200000.dss
>> 78d73
>> < platform:f4000.pinctrl--platform:30200000.dss
>> 97a93
>>   > regulator:regulator.0--platform:display
>>
>>    Tomi
>>
>>
>>> Thanks,
>>> Saravana
>>>
>>> v1 -> v2:
>>> - Removed the RFC tag
>>> - Remaned the subject. v1 is https://lore.kernel.org/all/20241025223721.184998-1-saravanak@google.com/T/#u
>>> - Added a NULL check to avoid NULL pointer deref
>>>
>>>    drivers/base/core.c | 46 ++++++++++++++++++++-------------------------
>>>    1 file changed, 20 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>> index 3b13fed1c3e3..f96f2e4c76b4 100644
>>> --- a/drivers/base/core.c
>>> +++ b/drivers/base/core.c
>>> @@ -1990,10 +1990,10 @@ static struct device *fwnode_get_next_parent_dev(const struct fwnode_handle *fwn
>>>     *
>>>     * Return true if one or more cycles were found. Otherwise, return false.
>>>     */
>>> -static bool __fw_devlink_relax_cycles(struct device *con,
>>> +static bool __fw_devlink_relax_cycles(struct fwnode_handle *con_handle,
>>>                                 struct fwnode_handle *sup_handle)
>>>    {
>>> -     struct device *sup_dev = NULL, *par_dev = NULL;
>>> +     struct device *sup_dev = NULL, *par_dev = NULL, *con_dev = NULL;
>>>        struct fwnode_link *link;
>>>        struct device_link *dev_link;
>>>        bool ret = false;
>>> @@ -2010,22 +2010,22 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>>
>>>        sup_handle->flags |= FWNODE_FLAG_VISITED;
>>>
>>> -     sup_dev = get_dev_from_fwnode(sup_handle);
>>> -
>>>        /* Termination condition. */
>>> -     if (sup_dev == con) {
>>> +     if (sup_handle == con_handle) {
>>>                pr_debug("----- cycle: start -----\n");
>>>                ret = true;
>>>                goto out;
>>>        }
>>>
>>> +     sup_dev = get_dev_from_fwnode(sup_handle);
>>> +     con_dev = get_dev_from_fwnode(con_handle);
>>>        /*
>>>         * If sup_dev is bound to a driver and @con hasn't started binding to a
>>>         * driver, sup_dev can't be a consumer of @con. So, no need to check
>>>         * further.
>>>         */
>>>        if (sup_dev && sup_dev->links.status ==  DL_DEV_DRIVER_BOUND &&
>>> -         con->links.status == DL_DEV_NO_DRIVER) {
>>> +         con_dev && con_dev->links.status == DL_DEV_NO_DRIVER) {
>>>                ret = false;
>>>                goto out;
>>>        }
>>> @@ -2034,7 +2034,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>>                if (link->flags & FWLINK_FLAG_IGNORE)
>>>                        continue;
>>>
>>> -             if (__fw_devlink_relax_cycles(con, link->supplier)) {
>>> +             if (__fw_devlink_relax_cycles(con_handle, link->supplier)) {
>>>                        __fwnode_link_cycle(link);
>>>                        ret = true;
>>>                }
>>> @@ -2049,7 +2049,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>>        else
>>>                par_dev = fwnode_get_next_parent_dev(sup_handle);
>>>
>>> -     if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode)) {
>>> +     if (par_dev && __fw_devlink_relax_cycles(con_handle, par_dev->fwnode)) {
>>>                pr_debug("%pfwf: cycle: child of %pfwf\n", sup_handle,
>>>                         par_dev->fwnode);
>>>                ret = true;
>>> @@ -2067,7 +2067,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
>>>                    !(dev_link->flags & DL_FLAG_CYCLE))
>>>                        continue;
>>>
>>> -             if (__fw_devlink_relax_cycles(con,
>>> +             if (__fw_devlink_relax_cycles(con_handle,
>>>                                              dev_link->supplier->fwnode)) {
>>>                        pr_debug("%pfwf: cycle: depends on %pfwf\n", sup_handle,
>>>                                 dev_link->supplier->fwnode);
>>> @@ -2140,25 +2140,19 @@ static int fw_devlink_create_devlink(struct device *con,
>>>                return -EINVAL;
>>>
>>>        /*
>>> -      * SYNC_STATE_ONLY device links don't block probing and supports cycles.
>>> -      * So, one might expect that cycle detection isn't necessary for them.
>>> -      * However, if the device link was marked as SYNC_STATE_ONLY because
>>> -      * it's part of a cycle, then we still need to do cycle detection. This
>>> -      * is because the consumer and supplier might be part of multiple cycles
>>> -      * and we need to detect all those cycles.
>>> +      * Don't try to optimize by not calling the cycle detection logic under
>>> +      * certain conditions. There's always some corner case that won't get
>>> +      * detected.
>>>         */
>>> -     if (!device_link_flag_is_sync_state_only(flags) ||
>>> -         flags & DL_FLAG_CYCLE) {
>>> -             device_links_write_lock();
>>> -             if (__fw_devlink_relax_cycles(con, sup_handle)) {
>>> -                     __fwnode_link_cycle(link);
>>> -                     flags = fw_devlink_get_flags(link->flags);
>>> -                     pr_debug("----- cycle: end -----\n");
>>> -                     dev_info(con, "Fixed dependency cycle(s) with %pfwf\n",
>>> -                              sup_handle);
>>> -             }
>>> -             device_links_write_unlock();
>>> +     device_links_write_lock();
>>> +     if (__fw_devlink_relax_cycles(link->consumer, sup_handle)) {
>>> +             __fwnode_link_cycle(link);
>>> +             flags = fw_devlink_get_flags(link->flags);
>>> +             pr_debug("----- cycle: end -----\n");
>>> +             pr_info("%pfwf: Fixed dependency cycle(s) with %pfwf\n",
>>> +                     link->consumer, sup_handle);
>>>        }
>>> +     device_links_write_unlock();
>>>
>>>        if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
>>>                sup_dev = fwnode_get_next_parent_dev(sup_handle);
>>

View attachment "boot-fixed.txt" of type "text/plain" (116630 bytes)

View attachment "devlink-broken.txt" of type "text/plain" (6862 bytes)

View attachment "devlink-fixed.txt" of type "text/plain" (6839 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ