lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fecb4264-217b-464b-9b1c-226898abff7b@notapiano>
Date: Tue, 15 Oct 2024 17:32:14 -0400
From: Nícolas F. R. A. Prado <nfraprado@...labora.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Saravana Kannan <saravanak@...gle.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"Rafael J. Wysocki" <rafael@...nel.org>, kernel@...labora.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] driver core: Don't log intentional skip of device link
 creation as error

On Mon, Oct 14, 2024 at 01:49:56PM +0100, Jon Hunter wrote:
> Hi Nicolas, Saravanna,
> 
> On 02/10/2024 21:57, Nícolas F. R. A. Prado wrote:
> > On Fri, Aug 09, 2024 at 12:13:25PM -0400, Nícolas F. R. A. Prado wrote:
> > > On Mon, Jul 29, 2024 at 05:08:48PM -0700, Saravana Kannan wrote:
> > > > On Mon, Jul 29, 2024 at 2:25 PM Nícolas F. R. A. Prado
> > > > <nfraprado@...labora.com> wrote:
> > > > > 
> > > > > On Tue, Jun 25, 2024 at 09:56:07AM -0400, Nícolas F. R. A. Prado wrote:
> > > > > > On Mon, Jun 24, 2024 at 04:53:30PM -0700, Saravana Kannan wrote:
> > > > > > > On Mon, Jun 24, 2024 at 8:21 AM Nícolas F. R. A. Prado
> > > > > > > <nfraprado@...labora.com> wrote:
> > > > > > > > 
> > > > > > > > Commit ac66c5bbb437 ("driver core: Allow only unprobed consumers for
> > > > > > > > SYNC_STATE_ONLY device links") introduced an early return in
> > > > > > > > device_link_add() to prevent useless links from being created. However
> > > > > > > > the calling function fw_devlink_create_devlink() unconditionally prints
> > > > > > > > an error if device_link_add() didn't create a link, even in this case
> > > > > > > > where it is intentionally skipping the link creation.
> > > > > > > > 
> > > > > > > > Add a check to detect if the link wasn't created intentionally and in
> > > > > > > > that case don't log an error.
> > > > > > > 
> > > > > > > Your point is somewhat valid, and I might Ack this. But this really
> > > > > > > shouldn't be happening a lot. Can you give more context on how you are
> > > > > > > hitting this?
> > > > > > 
> > > > > > Of course. I'm seeing this on the mt8195-cherry-tomato-r2 platform.
> > > > > > 
> > > > > > The following error is printed during boot:
> > > > > > 
> > > > > >    mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0
> > > > > > 
> > > > > > It doesn't happen with the upstream defconfig, but with the following config
> > > > > > change it does:
> > > > > > 
> > > > > >    -CONFIG_PWM_MTK_DISP=m
> > > > > >    +CONFIG_PWM_MTK_DISP=y
> > > > > > 
> > > > > > That probably changes the order in which the MTK DP and the backlight drivers
> > > > > > probe, resulting in the error.
> > > > > > 
> > > > > > One peculiarity that comes to mind is that the DP driver calls
> > > > > > devm_of_dp_aux_populate_bus() to run a callback once the panel has finished
> > > > > > probing. I'm not sure if this could have something to do with the error.
> > > > > > 
> > > > > > Full log at https://lava.collabora.dev/scheduler/job/14573149
> > > > > 
> > > > > Hi Saravana,
> > > > > 
> > > > > With the given context for where this issue is happening, what do you think
> > > > > about this patch?
> > > > 
> > > > Ah sorry, missed your earlier email.
> > > > 
> > > > Couple of points:
> > > > 1. It looks like the link in question is "SYNC_STATE_ONLY" because
> > > > it's part of a cycle. Correct me if I'm wrong. You might want to use
> > > > the new "post-init-providers" property to help fw_devlink break the
> > > > cycle and enforce the right dependency between the edp-tx and your
> > > > backlight. And then this error should go away and your device ordering
> > > > is enforced a bit better.
> > > 
> > > I don't see any cycle there. edp-tx points to backlight, but backlight doesn't
> > > point back (from mt8195-cherry.dtsi):
> > > 
> > >    &edp_tx {
> > >    	...
> > >    	aux-bus {
> > >    		panel {
> > >    			compatible = "edp-panel";
> > >    			power-supply = <&pp3300_disp_x>;
> > >    			backlight = <&backlight_lcd0>;
> > > 
> > >    backlight_lcd0: backlight-lcd0 {
> > >    	compatible = "pwm-backlight";
> > >    	brightness-levels = <0 1023>;
> > >    	default-brightness-level = <576>;
> > >    	enable-gpios = <&pio 82 GPIO_ACTIVE_HIGH>;
> > >    	num-interpolated-steps = <1023>;
> > >    	pwms = <&disp_pwm0 0 500000>;
> > >    	power-supply = <&ppvar_sys>;
> > >    };
> > > 
> > > And DL_FLAG_CYCLE is not set in the flags in the error log: 0x180. (Let me know
> > > if there's something else that I should be looking at to detect a cycle)
> > 
> > Hi Saravana,
> > 
> > Here are some debug logs to help contextualize the issue:
> > 
> >    [    0.198518] device: 'backlight-lcd0': device_add
> >    [    0.198655] platform 1c500000.edp-tx: Linked as a sync state only consumer to backlight-lcd0
> >    [   34.971653] platform backlight-lcd0: error -EPROBE_DEFER: supplier 1100e000.pwm not ready
> >    [   35.115480] mediatek-drm-dp 1c500000.edp-tx: driver: 'mediatek-drm-dp': driver_bound: bound to device
> >    [   35.160115] mediatek-drm-dp 1c500000.edp-tx: Dropping the link to backlight-lcd0
> >    [   53.910433] pwm-backlight backlight-lcd0: driver: 'pwm-backlight': driver_bound: bound to device
> >    [   53.919213] mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0
> > 
> > So a SYNC_STATE_ONLY device link is created from backlight-lcd0 to edp-tx. When
> > the edp-tx probes, the link is dropped, since it is SYNC_STATE_ONLY. When the
> > backlight-lcd0 probes a new devlink is attempted to the consumer edp-tx and
> > fails, since it is useless, printing the warning.
> > 
> > You mentioned a cycle before. The only cycle I see is between the edp-tx and the
> > panel, but doesn't involve the backlight:
> > 
> >    [    0.198104] ----- cycle: start -----
> >    [    0.198105] /soc/edp-tx@...00000/aux-bus/panel: cycle: depends on /soc/edp-tx@...00000
> >    [    0.198112] ----- cycle: start -----
> >    [    0.198113] /soc/edp-tx@...00000/aux-bus/panel: cycle: child of /soc/edp-tx@...00000
> >    [    0.198119] /soc/edp-tx@...00000: cycle: depends on /soc/edp-tx@...00000/aux-bus/panel
> >    [    0.198125] ----- cycle: end -----
> >    [    0.198126] platform 1c500000.edp-tx: Fixed dependency cycle(s) with /soc/edp-tx@...00000/aux-bus/panel
> > 
> > Just in case I tried using post-init-providers:
> > 
> >    diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    index 75d56b2d5a3d..19df138ef043 100644
> >    --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> >    @@ -322,6 +322,7 @@ &edp_tx {
> >            pinctrl-names = "default";
> >            pinctrl-0 = <&edptx_pins_default>;
> >    +       post-init-providers = <&panel>;
> >            ports {
> >                    #address-cells = <1>;
> >    @@ -344,7 +345,7 @@ edp_out: endpoint {
> >            };
> >            aux-bus {
> >    -               panel {
> >    +               panel: panel {
> >                            compatible = "edp-panel";
> >                            power-supply = <&pp3300_disp_x>;
> >                            backlight = <&backlight_lcd0>;
> > 
> > It broke the cycle, as I no longer see it in the logs, but the failed device
> > link warning is still there as expected.
> > 
> > It seems to me that the issue comes form the device link being SYNC_STATE_ONLY
> > in the first place. I see that comes from the 'else' path in
> > 
> > 	if (con->fwnode == link->consumer)
> > 		flags = fw_devlink_get_flags(link->flags);
> > 	else
> > 		flags = FW_DEVLINK_FLAGS_PERMISSIVE;
> > 
> > and the value on each side of the comparison is:
> > 
> > con->fwnode: /soc/edp-tx@...00000
> > link->consumer: /soc/edp-tx@...00000/aux-bus/panel
> > 
> > Could you help me understand what (if anything) is wrong here?
> > 
> > (I also noticed that as per the DT the consumer for backlight-lcd0 should be the
> > panel, but the devlink has it instead as the edp-tx, I'm guessing that's another
> > symptom of the same issue)
> 
> 
> I did not seen any update on this. It would be great to get this fixed.

Since there hasn't been a reply on this, let's postpone this investigation and
move forward in fixing the error log. I've sent v2 of the patch:
https://lore.kernel.org/all/20240624-fwdevlink-probed-no-err-45d21feb05fd-v2@collabora.com

Thanks,
Nícolas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ