lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <da4e5807-712d-4d08-a780-f363cee823b9@notapiano>
Date: Wed, 2 Oct 2024 16:57:16 -0400
From: Nícolas F. R. A. Prado <nfraprado@...labora.com>
To: Saravana Kannan <saravanak@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"Rafael J. Wysocki" <rafael@...nel.org>, kernel@...labora.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] driver core: Don't log intentional skip of device link
 creation as error

On Fri, Aug 09, 2024 at 12:13:25PM -0400, Nícolas F. R. A. Prado wrote:
> On Mon, Jul 29, 2024 at 05:08:48PM -0700, Saravana Kannan wrote:
> > On Mon, Jul 29, 2024 at 2:25 PM Nícolas F. R. A. Prado
> > <nfraprado@...labora.com> wrote:
> > >
> > > On Tue, Jun 25, 2024 at 09:56:07AM -0400, Nícolas F. R. A. Prado wrote:
> > > > On Mon, Jun 24, 2024 at 04:53:30PM -0700, Saravana Kannan wrote:
> > > > > On Mon, Jun 24, 2024 at 8:21 AM Nícolas F. R. A. Prado
> > > > > <nfraprado@...labora.com> wrote:
> > > > > >
> > > > > > Commit ac66c5bbb437 ("driver core: Allow only unprobed consumers for
> > > > > > SYNC_STATE_ONLY device links") introduced an early return in
> > > > > > device_link_add() to prevent useless links from being created. However
> > > > > > the calling function fw_devlink_create_devlink() unconditionally prints
> > > > > > an error if device_link_add() didn't create a link, even in this case
> > > > > > where it is intentionally skipping the link creation.
> > > > > >
> > > > > > Add a check to detect if the link wasn't created intentionally and in
> > > > > > that case don't log an error.
> > > > >
> > > > > Your point is somewhat valid, and I might Ack this. But this really
> > > > > shouldn't be happening a lot. Can you give more context on how you are
> > > > > hitting this?
> > > >
> > > > Of course. I'm seeing this on the mt8195-cherry-tomato-r2 platform.
> > > >
> > > > The following error is printed during boot:
> > > >
> > > >   mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0
> > > >
> > > > It doesn't happen with the upstream defconfig, but with the following config
> > > > change it does:
> > > >
> > > >   -CONFIG_PWM_MTK_DISP=m
> > > >   +CONFIG_PWM_MTK_DISP=y
> > > >
> > > > That probably changes the order in which the MTK DP and the backlight drivers
> > > > probe, resulting in the error.
> > > >
> > > > One peculiarity that comes to mind is that the DP driver calls
> > > > devm_of_dp_aux_populate_bus() to run a callback once the panel has finished
> > > > probing. I'm not sure if this could have something to do with the error.
> > > >
> > > > Full log at https://lava.collabora.dev/scheduler/job/14573149
> > >
> > > Hi Saravana,
> > >
> > > With the given context for where this issue is happening, what do you think
> > > about this patch?
> > 
> > Ah sorry, missed your earlier email.
> > 
> > Couple of points:
> > 1. It looks like the link in question is "SYNC_STATE_ONLY" because
> > it's part of a cycle. Correct me if I'm wrong. You might want to use
> > the new "post-init-providers" property to help fw_devlink break the
> > cycle and enforce the right dependency between the edp-tx and your
> > backlight. And then this error should go away and your device ordering
> > is enforced a bit better.
> 
> I don't see any cycle there. edp-tx points to backlight, but backlight doesn't
> point back (from mt8195-cherry.dtsi): 
> 
>   &edp_tx {
>   	...
>   	aux-bus {
>   		panel {
>   			compatible = "edp-panel";
>   			power-supply = <&pp3300_disp_x>;
>   			backlight = <&backlight_lcd0>;
> 
>   
>   backlight_lcd0: backlight-lcd0 {
>   	compatible = "pwm-backlight";
>   	brightness-levels = <0 1023>;
>   	default-brightness-level = <576>;
>   	enable-gpios = <&pio 82 GPIO_ACTIVE_HIGH>;
>   	num-interpolated-steps = <1023>;
>   	pwms = <&disp_pwm0 0 500000>;
>   	power-supply = <&ppvar_sys>;
>   };
> 
> And DL_FLAG_CYCLE is not set in the flags in the error log: 0x180. (Let me know
> if there's something else that I should be looking at to detect a cycle)

Hi Saravana,

Here are some debug logs to help contextualize the issue:

  [    0.198518] device: 'backlight-lcd0': device_add
  [    0.198655] platform 1c500000.edp-tx: Linked as a sync state only consumer to backlight-lcd0
  
  [   34.971653] platform backlight-lcd0: error -EPROBE_DEFER: supplier 1100e000.pwm not ready
  
  [   35.115480] mediatek-drm-dp 1c500000.edp-tx: driver: 'mediatek-drm-dp': driver_bound: bound to device
  [   35.160115] mediatek-drm-dp 1c500000.edp-tx: Dropping the link to backlight-lcd0
  
  [   53.910433] pwm-backlight backlight-lcd0: driver: 'pwm-backlight': driver_bound: bound to device
  [   53.919213] mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0

So a SYNC_STATE_ONLY device link is created from backlight-lcd0 to edp-tx. When
the edp-tx probes, the link is dropped, since it is SYNC_STATE_ONLY. When the
backlight-lcd0 probes a new devlink is attempted to the consumer edp-tx and
fails, since it is useless, printing the warning.

You mentioned a cycle before. The only cycle I see is between the edp-tx and the
panel, but doesn't involve the backlight:

  [    0.198104] ----- cycle: start -----
  [    0.198105] /soc/edp-tx@...00000/aux-bus/panel: cycle: depends on /soc/edp-tx@...00000
  [    0.198112] ----- cycle: start -----
  [    0.198113] /soc/edp-tx@...00000/aux-bus/panel: cycle: child of /soc/edp-tx@...00000
  [    0.198119] /soc/edp-tx@...00000: cycle: depends on /soc/edp-tx@...00000/aux-bus/panel
  [    0.198125] ----- cycle: end -----
  [    0.198126] platform 1c500000.edp-tx: Fixed dependency cycle(s) with /soc/edp-tx@...00000/aux-bus/panel

Just in case I tried using post-init-providers:

  diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  index 75d56b2d5a3d..19df138ef043 100644
  --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
  @@ -322,6 +322,7 @@ &edp_tx {
  
          pinctrl-names = "default";
          pinctrl-0 = <&edptx_pins_default>;
  +       post-init-providers = <&panel>;
  
          ports {
                  #address-cells = <1>;
  @@ -344,7 +345,7 @@ edp_out: endpoint {
          };
  
          aux-bus {
  -               panel {
  +               panel: panel {
                          compatible = "edp-panel";
                          power-supply = <&pp3300_disp_x>;
                          backlight = <&backlight_lcd0>;

It broke the cycle, as I no longer see it in the logs, but the failed device
link warning is still there as expected.

It seems to me that the issue comes form the device link being SYNC_STATE_ONLY
in the first place. I see that comes from the 'else' path in

	if (con->fwnode == link->consumer)
		flags = fw_devlink_get_flags(link->flags);
	else
		flags = FW_DEVLINK_FLAGS_PERMISSIVE;

and the value on each side of the comparison is:

con->fwnode: /soc/edp-tx@...00000
link->consumer: /soc/edp-tx@...00000/aux-bus/panel

Could you help me understand what (if anything) is wrong here?

(I also noticed that as per the DT the consumer for backlight-lcd0 should be the
panel, but the devlink has it instead as the edp-tx, I'm guessing that's another
symptom of the same issue)

Thanks,
Nícolas

> 
> Thanks,
> Nícolas
> 
> > 
> > 2. If we take this patch, it might be better to make a "useless device
> > links" helper function and use the same on in device_link_add() and in
> > the "if" condition that's used to decide if a device link should be
> > created. The con == sup is another "useless device link" check. But
> > I'm not really sure if we should mask this error as it help with stuff
> > like (1). con == sup is a bit more clear cut example of a "useless
> > device link" in the context of fw_devlink.
> > 
> > -Saravana

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ