lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63bba15b-6d8d-5ba8-d99d-8cd2dd763262@quicinc.com>
Date: Wed, 14 Feb 2024 12:35:58 -0800
From: Abhinav Kumar <quic_abhinavk@...cinc.com>
To: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
CC: Rob Clark <robdclark@...il.com>, Sean Paul <sean@...rly.run>,
        Marijn
 Suijten <marijn.suijten@...ainline.org>,
        David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
        <linux-arm-msm@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
        <freedreno@...ts.freedesktop.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful



On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
>>
>>
>>
>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
>>> We have several reports of vblank timeout messages. However after some
>>> debugging it was found that there might be different causes to that.
>>> To allow us to identify the DPU block that gets stuck, include the
>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
>>> snapshot capture.
>>>
>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
>>> ---
>>> Changes in v2:
>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
>>>     (Abhinav)
>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
>>> ---
>>>    drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> index d0f56c5c4cce..a8d6165b3c0a 100644
>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
>>>                (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
>>>                msecs_to_jiffies(50));
>>>        if (ret <= 0) {
>>> -             DPU_ERROR("vblank timeout\n");
>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
>>
>>
>> There is no rate limiting in this piece of code unfortunately. So this
>> will flood the number of snapshots.
> 
> Well... Yes and no. The devcoredump will destroy other snapshots if
> there is a pending one. So only the console will be flooded and only
> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> 

Yes, true but at the same time this makes it hard to capture a good dump 
as potentially every vblank you could timeout so this destroy/create 
cycle wont end.

>>
>> Short-term solution is you can go with a vblank_timeout_cnt and reset it
>> in the enable() like other similar error counters.
>>
>> long-term solution is we need to centralize these error locations to one
>> single dpu_encoder_error_handler() with a single counter and the error
>> handler will print out the error code along with the snapshot instead of
>> the snapshot being called from all over the place.
>>
>>
>>
>>>                return -ETIMEDOUT;
>>>        }
>>>
>>>
>>> ---
>>> base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
>>> change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063
>>>
>>> Best regards,
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ