[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230717094905.7a1ee007@collabora.com>
Date: Mon, 17 Jul 2023 09:49:05 +0200
From: Boris Brezillon <boris.brezillon@...labora.com>
To: Dmitry Osipenko <dmitry.osipenko@...labora.com>
Cc: Rob Herring <robh@...nel.org>, Steven Price <steven.price@....com>,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
kernel@...labora.com
Subject: Re: [PATCH v1] drm/panfrost: Sync IRQ by job's timeout handler
On Mon, 17 Jul 2023 10:20:02 +0300
Dmitry Osipenko <dmitry.osipenko@...labora.com> wrote:
> Hi,
>
> On 7/17/23 10:05, Boris Brezillon wrote:
> > Hi Dmitry,
> >
> > On Mon, 17 Jul 2023 09:52:54 +0300
> > Dmitry Osipenko <dmitry.osipenko@...labora.com> wrote:
> >
> >> Panfrost IRQ handler may stuck for a long time, for example this happens
> >> when there is a bad HDMI connection and HDMI handler takes a long time to
> >> finish processing, holding Panfrost. Make Panfrost's job timeout handler
> >> to sync IRQ before checking fence signal status in order to prevent
> >> spurious job timeouts due to a slow IRQ processing.
> >
> > Feels like the problem should be fixed in the HDMI encoder driver
> > instead, so it doesn't stall the whole system when processing its
> > IRQs (use threaded irqs, maybe). I honestly don't think blocking in the
> > job timeout path to flush IRQs is a good strategy.
>
> The syncing is necessary to have for correctness regardless of whether
> it's HDMI problem or something else, there could be other reasons for
> CPU to delay IRQ processing. It's wrong to say that hw is hung, while
> it's not.
Well, hardware is effectively hung, if not indefinitely, at least
temporarily. All you do here is block in the timeout handler path
waiting for the GPU interrupt handlers to finish, handler that's
probably waiting in the queue, because the raw HDMI handler is blocking
it somehow. So, in the end, you might just be delaying the time of HWR a
bit more. I know it's not GPU's fault in that case, and the job could
have finished in time if the HDMI encoder hadn't stall the interrupt
handling pipeline, but I'm not sure we should care for that specific
situation. And more importantly, if it took more than 500ms to get a
frame rendered (or, in that case, to get the event that a frame is
rendered), you already lost, so I'm not sure correctness matters:
rendering didn't make it in time, and the watchdog kicked in to try and
unblock the situation. Feels like we're just papering over an HDMI
encoder driver bug here, really.
Powered by blists - more mailing lists