lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b8a93c7-ba4a-47b7-9ba9-c94e3c6a7fc5@ideasonboard.com>
Date: Wed, 23 Oct 2024 11:52:33 +0300
From: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Linux Kernel List <linux-kernel@...r.kernel.org>,
 Aradhya Bhatia <aradhya.bhatia@...ux.dev>,
 "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
 Devarsh Thakkar <devarsht@...com>
Subject: Re: fw_devlinks preventing a panel driver from probing

Hi,

On 22/10/2024 19:07, Saravana Kannan wrote:
> On Tue, Oct 22, 2024 at 12:51 AM Tomi Valkeinen
> <tomi.valkeinen@...asonboard.com> wrote:
>>
>> Hi,
>>
>> On 22/10/2024 02:29, Saravana Kannan wrote:
>>> Hi Tomi,
>>>
>>> Sorry it took a while to get back.
>>>
>>> On Mon, Sep 16, 2024 at 4:52 AM Tomi Valkeinen
>>> <tomi.valkeinen@...asonboard.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We have an issue where two devices have dependencies to each other,
>>>> according to drivers/base/core.c's fw_devlinks, and this prevents them
>>>> from probing. I've been adding debugging to the core.c, but so far I
>>>> don't quite grasp the issue, so I thought to ask. Maybe someone can
>>>> instantly say that this just won't work...
>>>>
>>>> So, we have two devices, DSS (display subsystem) and an LVDS panel. The
>>>> DSS normally outputs parallel video from its video ports (VP), but it
>>>> has an integrated LVDS block (OLDI, Open LVDS Display Interface). The
>>>> OLDI block takes input from DSS's parallel outputs. The OLDI is not
>>>> modeled as a separate device (neither in the DT nor in the Linux device
>>>> model) as it has no register space, and is controlled fully by the DSS.
>>>>
>>>> To support dual-link LVDS, the DSS has two OLDI instances. They both
>>>> take their input from the same parallel video port, but each OLDI sends
>>>> alternate lines forward. So for a dual-link setup the connections would
>>>> be like this:
>>>>
>>>> +-----+-----+         +-------+         +----------+
>>>> |     |     |         |       |         |          |
>>>> |     | VP1 +----+--->| OLDI0 +-------->|          |
>>>> |     |     |    |    |       |         |          |
>>>> | DSS +-----+    |    +-------+         |  Panel   |
>>>> |     |     |    |    |       |         |          |
>>>> |     | VP2 |    +--->| OLDI1 +-------->|          |
>>>> |     |     |         |       |         |          |
>>>> +-----+-----+         +-------+         +----------+
>>>>
>>>> As the OLDI is not a separate device, it also does not have an
>>>> independent device tree node, but rather it's inside DSS's node. The DSS
>>>> parallel outputs are under a normal "ports" node, but OLDI ports are
>>>> under "oldi-txes/ports" (see below for dts to clarify this).
>>>>
>>>> And I think (guess...) this is the root of the issue we're seeing, as it
>>>> means the following, one or both of which might be the reason for this
>>>> issue:
>>>>
>>>> - OLDI fwnodes don't have an associated struct device *. I think the
>>>> reason is that the OLDI media graph ports are one level too deep in the
>>>> hierarchy. So while the DSS ports are associated with the DSS device,
>>>> OLDI ports are not.
>>>
>>> This is the root cause of the issue in some sense. fw_devlink doesn't
>>> know that DSS depends on the VP. In the current DT, it only appears as
>>> if the OLDI depends on VP. See further below for the fix.
>>>
>>>>
>>>> - The VP ports inside the DSS point to OLDI ports, which are also inside
>>>> DSS. So ports from a device point to ports in the same device (and back).
>>>>
>>>> If I understand the fw_devlink code correctly, in a normal case the
>>>> links formed with media graphs are marked as a cycle
>>>> (FWLINK_FLAG_CYCLE), and then ignored as far as probing goes.
>>>>
>>>> What we see here is that when using a single-link OLDI panel, the panel
>>>> driver's probe never gets called, as it depends on the OLDI, and the
>>>> link between the panel and the OLDI is not a cycle.
>>>>
>>>> The DSS driver probes, but the probe fails as it requires all the panel
>>>> devices to have been probed (and thus registered to the DRM framework)
>>>> before it can finish its setup.
>>>>
>>>> With dual-link, probing does happen and the drivers work. But I believe
>>>> this is essentially an accident, in the sense that the first link
>>>> between the panel and the OLDI still blocks the probing, but the second
>>>> links allows the driver core to traverse the devlinks further, causing
>>>> it to mark the links to the panel as FWLINK_FLAG_CYCLE (or maybe it only
>>>> marks one of those links, and that's enough).
>>>>
>>>> If I set fw_devlink=off as a kernel parameter, the probing proceeds
>>>> successfully in both single- and dual-link cases.
>>>>
>>>> Now, my questions is, is this a bug in the driver core, a bug in the DT
>>>> bindings, or something in between (DT is fine-ish, but the structure is
>>>> something that won't be supported by the driver core).
>>>>
>>>> And a follow-up question, regardless of the answer to the first one:
>>>> which direction should I go from here =).
>>>>
>>>> The device tree data (simplified) for this is as follows, first the
>>>> dual-link case, then the single-link case:
>>>>
>>>> /* Dual-link */
>>>>
>>>> dss: dss@...00000 {
>>>>           compatible = "ti,am625-dss";
>>>>
>>>>           oldi-txes {
>>>>                   oldi0: oldi@0 {
>>>>                           oldi0_ports: ports {
>>>>                                   port@0 {
>>>>                                           oldi_0_in: endpoint {
>>>>                                                   remote-endpoint = <&dpi0_out0>;
>>>>                                           };
>>>>                                   };
>>>>
>>>>                                   port@1 {
>>>>                                           oldi_0_out: endpoint {
>>>>                                                   remote-endpoint = <&lcd_in0>;
>>>>                                           };
>>>>                                   };
>>>>                           };
>>>>                   };
>>>>
>>>>                   oldi1: oldi@1 {
>>>>                           oldi1_ports: ports {
>>>>                                   port@0 {
>>>>                                           oldi_1_in: endpoint {
>>>>                                                   remote-endpoint = <&dpi0_out1>;
>>>>                                           };
>>>>                                   };
>>>>
>>>>                                   port@1 {
>>>>                                           oldi_1_out: endpoint {
>>>>                                                   remote-endpoint = <&lcd_in1>;
>>>>                                           };
>>>>                                   };
>>>>                           };
>>>>                   };
>>>>           };
>>>>
>>>>           dss_ports: ports {
>>>>                   port@0 {
>>>>                           dpi0_out0: endpoint@0 {
>>>>                                   remote-endpoint = <&oldi_0_in>;
>>>>                           };
>>>>                           dpi0_out1: endpoint@1 {
>>>>                                   remote-endpoint = <&oldi_1_in>;
>>>>                           };
>>>>                   };
>>>>           };
>>>> };
>>>>
>>>> display {
>>>>           compatible = "microtips,mf-101hiebcaf0", "panel-simple";
>>>
>>> In here, add this new property that I added some time back.
>>>
>>> post-init-providers = <&oldi-txes>;
>>
>> Thanks! This helps:
>>
>> post-init-providers = <&oldi0>;
>>
>> or for dual-link:
>>
>> post-init-providers = <&oldi0>, <&oldi1>;
>>
>>> This tells fw_devlink that VP doesn't depend on this node for
>>> initialization/probing. This property is basically available to break
>>> cycles in DT and mark one of the edges of the cycles as "not a real
>>> init dependency".
>>>
>>> You should do the same for the single link case too.
>>
>> While this helps, it's not very nice... Every new DT overlay that uses
>> OLDI display needs to have these.
> 
> Actually, taking a closer look at the DT and assuming I am visualizing
> it correctly in my head, fw_devlink should notice the cycle between
> oldi-txes and display and shouldn't block display from probing. Can
> you check the cycle detection code and see where it's bailing out
> early and not marking the fwnode link with the CYCLE flag?
> 
> __fw_devlink_relax_cycles() is where you want to look. There are a
> bunch of debug log messages inside it and around where it's called
> from.

I'm not quite sure how to read the debug messages. I have attached three 
kernel logs, with the debug prints enabled in drivers/base/core.c. The 
"fixed" is the one with the post-init-providers.

I load the display drivers as modules after the main boot has happened, 
and at the end of the logs I have the kernel prints when I load the 
modules. The single-link.txt also shows the debugfs/devices_deferred file.

The relevant strings to search are "dss", "oldi" and "display" (display 
is the panel).

So... All devlinks are supplier-consumer links. How are those created 
with an OF media graph, as there's no clear supplier nor consumer. In 
this particular case I see that display is marked as a consumer of oldi, 
but also dss is marked as a consumer of oldi. Is this just, essentially, 
random?

Also, as there's no separate device for OLDI, I don't see oldi at all in 
/sys/class/devlink/. But what I see there is a bit odd...

For dual link I get:

platform:display--platform:30200000.dss

which, I guess, makes sense. But for single link fixed case, I don't 
have anything there...

> Also, can you check debugfs/devices_deferred, or the
> "wait_for_supplier" file under /sys/devices/..../display/ to make sure
> it's actually stuck on oldi-txes? Just to make cure it's not some
> other corner case that's triggered by oldi-txes?
> 
>> I'm still confused about why this is needed. OF graphs are _always_
>> two-way links. Doesn't that mean that OF graphs never can be used for
>> dependencies, as they go both ways?
> 
> Good question :) Yes, they'll always be ignored as cycles. But with
> post-init-providers, it's actually better to add it so that cycles are
> broken and better ordering is enforced. See my talk at LPC referee
> talk about fw_devlink to see all the benefits you get from it. :)

Thanks for the pointer! It was interesting and I now understand the 
whole devlink thing better, although I have to say the details still 
escape me... =)

Also, isn't post-init-providers describing software behavior, not 
hardware? It doesn't sound like something we should have in the DT.

>> If so, shouldn't we just always
>> ignore all OF graphs for dependency checking?
> 
> There are cases when two devices A and B have remote-endpoints between
> them and ALSO have for example a gpio dependency between them. Where
> the gpio is the "post-init-supplier". If we don't parse
> remote-endpoint and mark the cycles, cases like these don't get to
> probe.

I'm sorry, I don't understand this. If we have 1) A and B with a (one 
way) gpio dependency, and 2) A and B with a (one way) gpio dependency 
_and_ two way media graph dependency, shouldn't the cases behave 
identically, as the graph dependency should just be ignored?

Or maybe I don't understand the example case at all... Why would the 
gpio be a post-init-supplier? Isn't gpio something you want at init time?

  Tomi

> 
> -Saravana
> 
>>
>>    Tomi
>>
>>>
>>> Hope that helps. Let me know.
>>>
>>> Thanks,
>>> Saravana
>>>
>>>>
>>>>           ports {
>>>>                   port@0 {
>>>>                           lcd_in0: endpoint {
>>>>                                   remote-endpoint = <&oldi_0_out>;
>>>>                           };
>>>>                   };
>>>>
>>>>                   port@1 {
>>>>                           lcd_in1: endpoint {
>>>>                                   remote-endpoint = <&oldi_1_out>;
>>>>                           };
>>>>                   };
>>>>           };
>>>> };
>>>>
>>>>
>>>> /* Single-link */
>>>>
>>>> dss: dss@...00000 {
>>>>           compatible = "ti,am625-dss";
>>>>
>>>>           oldi-txes {
>>>>                   oldi0: oldi@0 {
>>>>                           oldi0_ports: ports {
>>>>                                   port@0 {
>>>>                                           oldi_0_in: endpoint {
>>>>                                                   remote-endpoint = <&dpi0_out0>;
>>>>                                           };
>>>>                                   };
>>>>
>>>>                                   port@1 {
>>>>                                           oldi_0_out: endpoint {
>>>>                                                   remote-endpoint = <&lcd_in0>;
>>>>                                           };
>>>>                                   };
>>>>                           };
>>>>                   };
>>>>           };
>>>>
>>>>           dss_ports: ports {
>>>>                   port@0 {
>>>>                           dpi0_out0: endpoint@0 {
>>>>                                   remote-endpoint = <&oldi_0_in>;
>>>>                           };
>>>>                   };
>>>>           };
>>>> };
>>>>
>>>> display {
>>>>           compatible = "microtips,mf-101hiebcaf0", "panel-simple";
>>>>
>>>>           ports {
>>>>                   port@0 {
>>>>                           lcd_in0: endpoint {
>>>>                                   remote-endpoint = <&oldi_0_out>;
>>>>                           };
>>>>                   };
>>>>           };
>>>> };
>>>>
>>>>     Tomi
>>>>
>>

View attachment "dual-link.txt" of type "text/plain" (108464 bytes)

View attachment "single-link.txt" of type "text/plain" (107506 bytes)

View attachment "single-link-fixed.txt" of type "text/plain" (107956 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ