[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2df31f2d-8271-d966-158a-27c6e0581d72@quicinc.com>
Date: Tue, 20 Feb 2024 13:19:54 -0800
From: Abhinav Kumar <quic_abhinavk@...cinc.com>
To: Johan Hovold <johan@...nel.org>, Rob Clark <robdclark@...il.com>,
"Dmitry
Baryshkov" <dmitry.baryshkov@...aro.org>,
Kuogee Hsieh
<quic_khsieh@...cinc.com>
CC: Sean Paul <sean@...rly.run>,
Marijn Suijten
<marijn.suijten@...ainline.org>,
David Airlie <airlied@...il.com>, "Daniel
Vetter" <daniel@...ll.ch>,
Bjorn Andersson <quic_bjorande@...cinc.com>,
<quic_jesszhan@...cinc.com>, <quic_sbillaka@...cinc.com>,
<dri-devel@...ts.freedesktop.org>, <freedreno@...ts.freedesktop.org>,
<linux-arm-msm@...r.kernel.org>, <regressions@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>
Subject: Re: drm/msm: Second DisplayPort regression in 6.8-rc1
Hi Johan
On 2/19/2024 2:41 AM, Johan Hovold wrote:
> On Sat, Feb 17, 2024 at 04:14:58PM +0100, Johan Hovold wrote:
>> On Wed, Feb 14, 2024 at 02:52:06PM +0100, Johan Hovold wrote:
>>> On Tue, Feb 13, 2024 at 10:00:13AM -0800, Abhinav Kumar wrote:
>
>> Since Dmitry had trouble reproducing this issue I took a closer look at
>> the DRM aux bridge series that Abhinav pointed and was able to track
>> down the bridge regressions and come up with a reproducer. I just posted
>> a series fixing this here:
>>
>> https://lore.kernel.org/lkml/20240217150228.5788-1-johan+linaro@kernel.org/
>>
>> As I mentioned in the cover letter, I am still seeing intermittent hard
>> resets around the time that the DRM subsystem is initialising, which
>> suggests that we may be dealing with two separate DRM regressions here
>> however.
>>
>> If the hard resets are triggered by something like unclocked hardware,
>> perhaps that bit could this be related to the runtime PM rework?
>
> It seems my initial suspicion that at least some of these regressions
> were related to the runtime PM work was correct. The hard resets happens
> when the DP controller is runtime suspended after being probed:
>
> [ 16.748475] bus: 'platform': __driver_probe_device: matched device ae00000.display-subsystem with driver msm-mdss
> [ 16.759444] msm-mdss ae00000.display-subsystem: Adding to iommu group 21
> [ 16.795226] bus: 'platform': __driver_probe_device: matched device ae01000.display-controller with driver msm_dpu
> [ 16.807542] probe of ae01000.display-controller returned -517 after 3 usecs
> [ 16.821552] bus: 'platform': __driver_probe_device: matched device ae90000.displayport-controller with driver msm-dp-display
> [ 16.837749] probe of ae90000.displayport-controller returned -517 after 1 usecs
> [ OK ] Listening on Load/Save RF Kill Swit[ 16.854659] bus: 'platform': __dch Status /dev/rfkill Watch.
> [ 16.868458] probe of ae98000.displayport-controller returned -517 after 2 usecs
> [ 16.880012] bus: 'platform': __driver_probe_device: matched device aea0000.displayport-controller with driver msm-dp-display
> [ 16.891856] probe of aea0000.displayport-controller returned -517 after 2 usecs
> [ 16.903825] probe of ae00000.display-subsystem returned 0 after 144497 usecs
> [ 16.911636] bus: 'platform': __driver_probe_device: matched device ae01000.display-controller with driver msm_dpu
> [ 16.942092] probe of ae01000.display-controller returned 0 after 19593 usecs
> Starting Load/Save Screen Backligh…rightness[ 16.959146] bus: 'platform': _ of backlight:backlight...
> [ 16.995355] msm-dp-display ae90000.displayport-controller: dp_display_probe - probe tail
> [ 17.004032] probe of ae90000.displayport-controller returned 0 after 30225 usecs
> [ 17.012308] bus: 'platform': __driver_probe_device: matched device ae98000.displayport-controller with driver msm-dp-display
> [ 17.050193] msm-dp-display ae98000.displayport-controller: dp_display_probe - probe tail
> Starting Network Name Resolution...
> [ 17.058925] probe of ae98000.displayport-controller returned 0 after 34774 usecs
> [ 17.074925] bus: 'platform': __driver_probe_device: matched device aea0000.displayport-controller with driver msm-dp-display
> [ Starting Network Time Synchronization...
> [ 17.112000] msm-dp-display aea0000.displayport-controller: dp_display_probe - populate aux bus
> [ 17.125208] msm-dp-display aea0000.displayport-controller: dp_pm_runtime_resume
> Starting Record System Boot/Shutdown in UTMP...
> Starting Virtual Console Setup...
> [ OK ] Finished Load/Save Screen Backlight Brightness of backlight:backlight.
> [ 17.197909] msm-dp-display aea0000.displayport-controller: dp_pm_runtime_suspend
> [ 17.198079] probe of aea0Format: Log Type - Time(microsec) - Message - Optional Info
> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
> S - QC_IMAGE_VERSION_STRING=BOOT.MXF.1.1-00470-MAKENA-1
> S - IMAGE_VARIANT_STRING=SocMakenaWP
> S - OEM_IMAGE_VERSION_STRING=crm-ubuntu92
>
> < machine is reset by hypervisor >
>
> Presumably the reset happens when controller is being shut down while
> still being used by the EFI framebuffer.
>
I am not sure if we can conclude like that. Even if we shut off the
controller when the framebuffer was still being fetched that should only
cause a blank screen and not a reset because we really don't trigger a
new register write / read while its fetching so as such there is no new
hardware access.
One thing I must accept is that there are two differences between
sc8280xp where we are hitting these resets and sc7180/sc7280 chromebooks
where we tested it more thoroughly without any such issues:
1) with the chromebooks we have depthcharge and not the QC UEFI.
If we are suspecting a hand-off issue here, will it help if we try to
disable the display in EFI by using "fastboot oem select-display-panel
none" (assuming this is a fastboot enabled device) and see if you still
hit the reset issue?
2) chromebooks used "internal_hpd" whereas the pmic_glink method used in
the sc8280xp.
I am still checking if there are any code paths in the eDP/DP driver
left exposed due to this difference with pm_runtime which can cause
this. I am wondering if some sort of drm tracing will help to narrow
down the reset point.
> In the cases where the machines survives boot, the controller is never
> suspended.
>
> When investigating this I've also seen intermittent:
>
> [drm:dp_display_probe [msm]] *ERROR* device tree parsing failed
>
So this error I think is because in dp_parser_parse() --->
dp_parser_ctrl_res(), we also have a devm_phy_get().
This can return -EDEFER if the phy driver has not yet probed.
I checked the other things inside dp_parser_parse(), others calls seem
to be purely DT parsing except this one. I think to avoid the confusion,
we should move devm_phy_get() outside of DT parsing into a separate call
or atleast add an error log inside devm_phy_get() failure below to
indicate that it deferred
io->phy = devm_phy_get(&pdev->dev, "dp");
if (IS_ERR(io->phy))
return PTR_ERR(io->phy);
If my hypothesis is correct on this, then this error log (even though
misleading) should be harmless for this issue because if we hit
DRM_ERROR("device tree parsing failed\n"); we will skip the
devm_pm_runtime_enable().
> which also appears to be related to the runtime PM rework:
>
> https://lore.kernel.org/lkml/1701472789-25951-1-git-send-email-quic_khsieh@quicinc.com/
>
> I believe this is enough evidence to conclude that this second
> regression is introduced by commit 5814b8bf086a ("drm/msm/dp:
> incorporate pm_runtime framework into DP driver"):
>
> #regzbot introduced: 5814b8bf086a
>
> Has anyone given some thought to how the framebuffer handover is
> supposed to work? It seems we're currently just relying on luck with
> timing.
>
> Johan
Powered by blists - more mailing lists