[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c080829-b3c1-8217-37e5-7f515118d840@quicinc.com>
Date: Tue, 23 Aug 2022 15:18:10 -0700
From: Abhinav Kumar <quic_abhinavk@...cinc.com>
To: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Kuogee Hsieh <quic_khsieh@...cinc.com>, <robdclark@...il.com>,
<sean@...rly.run>, <swboyd@...omium.org>, <dianders@...omium.org>,
<vkoul@...nel.org>, <daniel@...ll.ch>, <airlied@...ux.ie>,
<agross@...nel.org>, <bjorn.andersson@...aro.org>
CC: <quic_sbillaka@...cinc.com>, <linux-arm-msm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
<quic_aravindh@...cinc.com>, <freedreno@...ts.freedesktop.org>
Subject: Re: [PATCH] drm/msm/dp: add atomic_check to bridge ops
Sorry missed one response,
On 8/23/2022 3:07 PM, Abhinav Kumar wrote:
>
>
> On 8/22/2022 11:33 AM, Dmitry Baryshkov wrote:
>> On 22/08/2022 20:32, Abhinav Kumar wrote:
>>>
>>>
>>> On 8/22/2022 9:49 AM, Dmitry Baryshkov wrote:
>>>> On 22/08/2022 19:38, Abhinav Kumar wrote:
>>>>> Hi Dmitry
>>>>>
>>>>> On 8/22/2022 9:18 AM, Dmitry Baryshkov wrote:
>>>>>> On 17/08/2022 21:01, Kuogee Hsieh wrote:
>>>>>>> DRM commit_tails() will disable downstream crtc/encoder/bridge if
>>>>>>> both disable crtc is required and crtc->active is set before pushing
>>>>>>> a new frame downstream.
>>>>>>>
>>>>>>> There is a rare case that user space display manager issue an extra
>>>>>>> screen update immediately followed by close DRM device while down
>>>>>>> stream display interface is disabled. This extra screen update will
>>>>>>> timeout due to the downstream interface is disabled but will cause
>>>>>>> crtc->active be set. Hence the followed commit_tails() called by
>>>>>>> drm_release() will pass the disable downstream crtc/encoder/bridge
>>>>>>> conditions checking even downstream interface is disabled.
>>>>>>> This cause the crash to happen at dp_bridge_disable() due to it
>>>>>>> trying
>>>>>>> to access the main link register to push the idle pattern out
>>>>>>> while main
>>>>>>> link clocks is disabled.
>>>>>>>
>>>>>>> This patch adds atomic_check to prevent the extra frame will not
>>>>>>> be pushed down if display interface is down so that crtc->active
>>>>>>> will not be set neither. This will fail the conditions checking
>>>>>>> of disabling down stream crtc/encoder/bridge which prevent
>>>>>>> drm_release() from calling dp_bridge_disable() so that crash
>>>>>>> at dp_bridge_disable() prevented.
>>>>>>
>>>>>> I must admit I had troubles parsing this description. However if I
>>>>>> got you right, I think the check that the main link clock is
>>>>>> running in the dp_bridge_disable() or dp_ctrl_push_idle() would be
>>>>>> a better fix.
>>>>>
>>>>> Originally, thats what was posted
>>>>> https://patchwork.freedesktop.org/patch/496984/.
>>>>
>>>> This patch is also not so correct from my POV. It checks for the hpd
>>>> status, while in reality it should check for main link clocks being
>>>> enabled.
>>>>
>>>
>>> We can push another fix to check for the clk state instead of the hpd
>>> status. But I must say we are again just masking something which the
>>> fwk should have avoided isnt it?
>>>
>>> As per the doc in the include/drm/drm_bridge.h it says,
>>>
>>> "*
>>> * The bridge can assume that the display pipe (i.e. clocks and timing
>>> * signals) feeding it is still running when this callback is called.
>>> *"
>>
>> Yes, that's what I meant about this chunk begging to go to the core.
>> In my opinion, if we are talking about the disconnected sinks, it is
>> the framework who should disallow submitting the frames to the
>> disconnected sinks.
>>
>>>
>>> By adding an extra layers of protection in the driver, we are just
>>> avoiding another issue but the commit should not have been issued in
>>> the first place.
>>>
>>> So shouldnt we do both then? That is add protection to check if clock
>>> is ON and also, reject commits when display is disconnected.
>>>
>>>>>
>>>>> Then it seemed like we were just protecting against an issue in the
>>>>> framework which was allowing the frames to be pushed even after the
>>>>> display was disconnected. The DP driver did send out the disconnect
>>>>> event correctly and as per the logs, this frame came down after
>>>>> that and the DRM fwk did allow it.
>>>>>
>>>>> So after discussing on IRC with Rob, we came up with this approach
>>>>> that
>>>>> if the display is not connected, then atomic_check should fail.
>>>>> That way the commit will not happen.
>>>>>
>>>>> Just seemed a bit cleaner instead of adding all our protections.
>>>>
>>>> The check to fail atomic_check if display is not connected seems out
>>>> of place. In its current way it begs go to the upper layer,
>>>> forbidding using disconnected sinks for all the drivers. There is
>>>> nothing special in the MSM DP driver with respect to the HPD events
>>>> processing and failing atomic_check() based on that.
>>>>
>>>
>>> Why all the drivers? This is only for MSM DP bridge.
>>
>> Yes, we change the MSM DRM driver. But the check is generic enough.
>> I'm not actually insisting on pushing the check to the core, just
>> trying to understand the real cause here.
>>
>>>
>
> I actually wanted to push this to the core and thats what I had
> originally asked on IRC because it does seem to be generic enough that
> it should belong to the core but after discussion with Rob on freedreno,
> he felt this was a better approach because for some of the legacy
> connectors like VGA, this need not belong to the DRM core, hence we went
> with this approach.
>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> SError Interrupt on CPU7, code 0x00000000be000411 -- SError
>>>>>>> CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
>>>>>>> Hardware name: Google Lazor (rev3 - 8) (DT)
>>>>>>> pstate: a04000c9 (NzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> pc : __cmpxchg_case_acq_32+0x14/0x2c
>>>>>>> lr : do_raw_spin_lock+0xa4/0xdc
>>>>>>> sp : ffffffc01092b6a0
>>>>>>> x29: ffffffc01092b6a0 x28: 0000000000000028 x27: 0000000000000038
>>>>>>> x26: 0000000000000004 x25: ffffffd2973dce48 x24: 0000000000000000
>>>>>>> x23: 00000000ffffffff x22: 00000000ffffffff x21: ffffffd2978d0008
>>>>>>> x20: ffffffd2978d0008 x19: ffffff80ff759fc0 x18: 0000000000000000
>>>>>>> x17: 004800a501260460 x16: 0441043b04600438 x15: 04380000089807d0
>>>>>>> x14: 07b0089807800780 x13: 0000000000000000 x12: 0000000000000000
>>>>>>> x11: 0000000000000438 x10: 00000000000007d0 x9 : ffffffd2973e09e4
>>>>>>> x8 : ffffff8092d53300 x7 : ffffff808902e8b8 x6 : 0000000000000001
>>>>>>> x5 : ffffff808902e880 x4 : 0000000000000000 x3 : ffffff80ff759fc0
>>>>>>> x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffffff80ff759fc0
>>>>>>> Kernel panic - not syncing: Asynchronous SError Interrupt
>>>>>>> CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
>>>>>>> Hardware name: Google Lazor (rev3 - 8) (DT)
>>>>>>> Call trace:
>>>>>>> dump_backtrace.part.0+0xbc/0xe4
>>>>>>> show_stack+0x24/0x70
>>>>>>> dump_stack_lvl+0x68/0x84
>>>>>>> dump_stack+0x18/0x34
>>>>>>> panic+0x14c/0x32c
>>>>>>> nmi_panic+0x58/0x7c
>>>>>>> arm64_serror_panic+0x78/0x84
>>>>>>> do_serror+0x40/0x64
>>>>>>> el1h_64_error_handler+0x30/0x48
>>>>>>> el1h_64_error+0x68/0x6c
>>>>>>> __cmpxchg_case_acq_32+0x14/0x2c
>>>>>>> _raw_spin_lock_irqsave+0x38/0x4c
>>
>> You know, after re-reading the trace, I could not help but notice that
>> the issue seems to be related to completion/timer/spinlock memory
>> becoming unavailable rather than disabling the main link clock.
>> See, the SError comes in the spin_lock path, not during register read.
>>
>> Thus I think the commit message is a bit misleading.
>>
>
> No, this issue is due to unclocked access. Please check this part of the
> stack:
>
> >>>>>> wait_for_completion_timeout+0x2c/0x54
> >>>>>> dp_ctrl_push_idle+0x40/0x88
> >>>>>> dp_bridge_disable+0x24/0x30
> >>>>>> drm_atomic_bridge_chain_disable+0x90/0xbc
> >>>>>> drm_atomic_helper_commit_modeset_disables+0x198/0x444
> >>>>>> msm_atomic_commit_tail+0x1d0/0x374
> >>>>>> commit_tail+0x80/0x108
> >>>>>> drm_atomic_helper_commit+0x118/0x11c
> >>>>>> drm_atomic_commit+0xb4/0xe0
> >>>>>> drm_client_modeset_commit_atomic+0x184/0x224
> >>>>>> drm_client_modeset_commit_locked+0x58/0x160
> >>>>>> drm_client_modeset_commit+0x3c/0x64
>
>> Can we please get a trace checking which calls were actually made for
>> the dp bridge and if the dp/dp->ctrl memory pointers are correct?
>>
>> I do not see the dp_display_disable() being called. Maybe I just
>> missed the call.
>>
>
> Yes it is called, please refer to the above part of the stack that I
> have pasted.
>
>>
>>>>>>> lock_timer_base+0x40/0x78
>>>>>>> __mod_timer+0xf4/0x25c
>>>>>>> schedule_timeout+0xd4/0xfc
>>>>>>> __wait_for_common+0xac/0x140
>>>>>>> wait_for_completion_timeout+0x2c/0x54
>>>>>>> dp_ctrl_push_idle+0x40/0x88
>>>>>>> dp_bridge_disable+0x24/0x30
>>>>>>> drm_atomic_bridge_chain_disable+0x90/0xbc
>>>>>>> drm_atomic_helper_commit_modeset_disables+0x198/0x444
>>>>>>> msm_atomic_commit_tail+0x1d0/0x374
>>>>>>> commit_tail+0x80/0x108
>>>>>>> drm_atomic_helper_commit+0x118/0x11c
>>>>>>> drm_atomic_commit+0xb4/0xe0
>>>>>>> drm_client_modeset_commit_atomic+0x184/0x224
>>>>>>> drm_client_modeset_commit_locked+0x58/0x160
>>>>>>> drm_client_modeset_commit+0x3c/0x64
>>>>>>> __drm_fb_helper_restore_fbdev_mode_unlocked+0x98/0xac
>>>>>>> drm_fb_helper_set_par+0x74/0x80
>>>>>>> drm_fb_helper_hotplug_event+0xdc/0xe0
>>>>>>> __drm_fb_helper_restore_fbdev_mode_unlocked+0x7c/0xac
>>>>>>> drm_fb_helper_restore_fbdev_mode_unlocked+0x20/0x2c
>>>>>>> drm_fb_helper_lastclose+0x20/0x2c
>>>>>>> drm_lastclose+0x44/0x6c
>>>>>>> drm_release+0x88/0xd4
>>>>>>> __fput+0x104/0x220
>>>>>>> ____fput+0x1c/0x28
>>>>>>> task_work_run+0x8c/0x100
>>>>>>> do_exit+0x450/0x8d0
>>>>>>> do_group_exit+0x40/0xac
>>>>>>> __wake_up_parent+0x0/0x38
>>>>>>> invoke_syscall+0x84/0x11c
>>>>>>> el0_svc_common.constprop.0+0xb8/0xe4
>>>>>>> do_el0_svc+0x8c/0xb8
>>>>>>> el0_svc+0x2c/0x54
>>>>>>> el0t_64_sync_handler+0x120/0x1c0
>>>>>>> el0t_64_sync+0x190/0x194
>>>>>>> SMP: stopping secondary CPUs
>>>>>>> Kernel Offset: 0x128e800000 from 0xffffffc008000000
>>>>>>> PHYS_OFFSET: 0x80000000
>>>>>>> CPU features: 0x800,00c2a015,19801c82
>>>>>>> Memory Limit: none
>>>>>>>
>>>>>>> Fixes: 8a3b4c17f863 ("drm/msm/dp: employ bridge mechanism for
>>>>>>> display enable and disable")
>>>>>>> Reported-by: Leonard Lausen <leonard@...sen.nl>
>>>>>>> Suggested-by: Rob Clark <robdclark@...il.com>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/17
>>>>>>> Signed-off-by: Kuogee Hsieh <quic_khsieh@...cinc.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/msm/dp/dp_drm.c | 23 +++++++++++++++++++++++
>>>>>>> 1 file changed, 23 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_drm.c
>>>>>>> b/drivers/gpu/drm/msm/dp/dp_drm.c
>>>>>>> index 6df25f7..c682588 100644
>>>>>>> --- a/drivers/gpu/drm/msm/dp/dp_drm.c
>>>>>>> +++ b/drivers/gpu/drm/msm/dp/dp_drm.c
>>>>>>> @@ -31,6 +31,25 @@ static enum drm_connector_status
>>>>>>> dp_bridge_detect(struct drm_bridge *bridge)
>>>>>>> connector_status_disconnected;
>>>>>>> }
>>>>>>> +static int dp_bridge_atomic_check(struct drm_bridge *bridge,
>>>>>>> + struct drm_bridge_state *bridge_state,
>>>>>>> + struct drm_crtc_state *crtc_state,
>>>>>>> + struct drm_connector_state *conn_state)
>>>>>>> +{
>>>>>>> + struct msm_dp *dp;
>>>>>>> +
>>>>>>> + dp = to_dp_bridge(bridge)->dp_display;
>>>>>>> +
>>>>>>> + drm_dbg_dp(dp->drm_dev, "is_connected = %s\n",
>>>>>>> + (dp->is_connected) ? "true" : "false");
>>>>>>> +
>>>>>>> + if (bridge->ops & DRM_BRIDGE_OP_HPD)
>>>>>>> + return (dp->is_connected) ? 0 : -ENOTCONN;
>>>>
>>>> This raises questions if this will work for the configurations when
>>>> other bridge is used for HPD events.
>>>>
>>>> Let's not mix the levels of processing. If we should not disable the
>>>> link because it is already disabled, let's just do so rather than
>>>> failing the atomic_check().
>>>>
>>>
>>> This is only for MSM DP's bridge. If we use another bridge which is
>>> capable of handling its own HPD, then that time MSM DP's bridge
>>> shouldnt set this flag.
>>
>> Not quite. The bridges set the ops to describe the ops that they
>> support themselves. Then the drm_bridge_connectors selects the bridge
>> handling hpd, etc. So the DRM_BRIDGE_OP_HPD is always set for DP
>> sources. But the question is quite the opposite: if we have the next
>> bridge (e.g. the usb-c-connector or the display-connector), will the
>> is_connected field be set correctly?
>>
Ah I got your concern now, I would say Yes because looking at Bjorn's
change https://patchwork.freedesktop.org/patch/496925/, I can see that
the next bridge would call into dp_bridge_hpd_notify() which calls
dp_display_send_hpd_notification() finally setting
dp->dp_display.is_connected to true.
>>>
>>> We can even replace this check with just checking if connector_type
>>> is DP but that would again open the discussion of having DP/eDP
>>> specific checks so we did it this way.
>>>
>>>
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +
>>>>>>> /**
>>>>>>> * dp_bridge_get_modes - callback to add drm modes via
>>>>>>> drm_mode_probed_add()
>>>>>>> * @bridge: Poiner to drm bridge
>>>>>>> @@ -61,6 +80,9 @@ static int dp_bridge_get_modes(struct
>>>>>>> drm_bridge *bridge, struct drm_connector *
>>>>>>> }
>>>>>>> static const struct drm_bridge_funcs dp_bridge_ops = {
>>>>>>> + .atomic_duplicate_state =
>>>>>>> drm_atomic_helper_bridge_duplicate_state,
>>>>>>> + .atomic_destroy_state =
>>>>>>> drm_atomic_helper_bridge_destroy_state,
>>>>>>> + .atomic_reset = drm_atomic_helper_bridge_reset,
>>>>>>> .enable = dp_bridge_enable,
>>>>>>> .disable = dp_bridge_disable,
>>>>>>> .post_disable = dp_bridge_post_disable,
>>>>>>> @@ -68,6 +90,7 @@ static const struct drm_bridge_funcs
>>>>>>> dp_bridge_ops = {
>>>>>>> .mode_valid = dp_bridge_mode_valid,
>>>>>>> .get_modes = dp_bridge_get_modes,
>>>>>>> .detect = dp_bridge_detect,
>>>>>>> + .atomic_check = dp_bridge_atomic_check,
>>>>>>> };
>>>>>>> struct drm_bridge *dp_bridge_init(struct msm_dp *dp_display,
>>>>>>> struct drm_device *dev,
>>>>>>
>>>>
>>
Powered by blists - more mailing lists