lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA8EJpo3XynBrm0S_BA_SxGOw963WQT9jh=YvLcT1N24FyEUsw@mail.gmail.com>
Date: Wed, 21 Feb 2024 00:42:50 +0200
From: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
To: Abhinav Kumar <quic_abhinavk@...cinc.com>
Cc: Rob Clark <robdclark@...il.com>, Sean Paul <sean@...rly.run>, 
	Marijn Suijten <marijn.suijten@...ainline.org>, David Airlie <airlied@...il.com>, 
	Daniel Vetter <daniel@...ll.ch>, linux-arm-msm@...r.kernel.org, 
	dri-devel@...ts.freedesktop.org, freedreno@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] drm/msm/dpu: make "vblank timeout" more useful

On Wed, 21 Feb 2024 at 00:40, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
>
>
>
> On 2/19/2024 3:52 AM, Dmitry Baryshkov wrote:
> > On Wed, 14 Feb 2024 at 22:36, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
> >>
> >>
> >>
> >> On 2/14/2024 11:20 AM, Dmitry Baryshkov wrote:
> >>> On Wed, 14 Feb 2024 at 20:02, Abhinav Kumar <quic_abhinavk@...cinc.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2/8/2024 6:50 AM, Dmitry Baryshkov wrote:
> >>>>> We have several reports of vblank timeout messages. However after some
> >>>>> debugging it was found that there might be different causes to that.
> >>>>> To allow us to identify the DPU block that gets stuck, include the
> >>>>> actual CTL_FLUSH value into the timeout message and trigger the devcore
> >>>>> snapshot capture.
> >>>>>
> >>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
> >>>>> ---
> >>>>> Changes in v2:
> >>>>> - Added a call to msm_disp_snapshot_state() to trigger devcore dump
> >>>>>      (Abhinav)
> >>>>> - Link to v1: https://lore.kernel.org/r/20240106-fd-dpu-debug-timeout-v1-1-6d9762884641@linaro.org
> >>>>> ---
> >>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 ++-
> >>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> index d0f56c5c4cce..a8d6165b3c0a 100644
> >>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
> >>>>> @@ -489,7 +489,8 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
> >>>>>                 (hw_ctl->ops.get_flush_register(hw_ctl) == 0),
> >>>>>                 msecs_to_jiffies(50));
> >>>>>         if (ret <= 0) {
> >>>>> -             DPU_ERROR("vblank timeout\n");
> >>>>> +             DPU_ERROR("vblank timeout: %x\n", hw_ctl->ops.get_flush_register(hw_ctl));
> >>>>> +             msm_disp_snapshot_state(phys_enc->parent->dev);
> >>>>
> >>>>
> >>>> There is no rate limiting in this piece of code unfortunately. So this
> >>>> will flood the number of snapshots.
> >>>
> >>> Well... Yes and no. The devcoredump will destroy other snapshots if
> >>> there is a pending one. So only the console will be flooded and only
> >>> in case when MSM_DISP_SNAPSHOT_DUMP_IN_CONSOLE is enabled.
> >>>
> >>
> >> Yes, true but at the same time this makes it hard to capture a good dump
> >> as potentially every vblank you could timeout so this destroy/create
> >> cycle wont end.
> >
> > Excuse me, maybe I miss something. On the first timeout the snapshot
> > is created. It is held by the kernel until it is fully read out from
> > the userspace. Other snapshots will not interfere with this snapshot.
> >
>
> For every new snapshot a new devcoredump device will be created which
> should remain till it has been read. But now this will be created every
> blank. IMO, this is really too much data for no reason.

No-no-no. If there is a devcoredump for a device, the next one will
not be created. See dev_coredumpm().
So all the snapshots will be created and then destroyed immediately.

>
> Subsequent vblank timeouts are not going to give any new information
> compared to the existing snapshot of the first vblank timeout thats why
> we should just create the snapshot when the first error happens and stop.
>
> For other frame done timeouts, infact subsequent timeouts without any
> sort of recovery in between are quite misleading because hardware was
> already not able to fetch the previous frame so it will most likely not
> fetch the next one either till it has recovered. Typically thats why
> these vblank timeouts happen in a flurry as the hardware never really
> recovered from the first timeout.
>
> > Or are you worried that snapshotting takes time, so taking a snapshot
> > will also interfere with the vblank timings for the next vblank?
> >
>
> Yes this is another point.

-- 
With best wishes
Dmitry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ