linux-kernel - Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YjNv+csmFyjNYc5b@phenom.ffwll.local>
Date:   Thu, 17 Mar 2022 18:29:29 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Christian König <christian.koenig@....com>
Cc:     Rob Clark <robdclark@...il.com>,
        Rob Clark <robdclark@...omium.org>,
        Jonathan Marek <jonathan@...ek.ca>,
        David Airlie <airlied@...ux.ie>,
        freedreno <freedreno@...ts.freedesktop.org>,
        Vladimir Lypak <vladimir.lypak@...il.com>,
        Abhinav Kumar <quic_abhinavk@...cinc.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Akhil P Oommen <quic_akhilpo@...cinc.com>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Sean Paul <sean@...rly.run>,
        open list <linux-kernel@...r.kernel.org>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
Subject: Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system
 suspend

On Thu, Mar 17, 2022 at 05:44:57PM +0100, Christian König wrote:
> Am 17.03.22 um 17:18 schrieb Rob Clark:
> > On Thu, Mar 17, 2022 at 9:04 AM Christian König
> > <christian.koenig@....com> wrote:
> > > Am 17.03.22 um 16:10 schrieb Rob Clark:
> > > > [SNIP]
> > > > userspace frozen != kthread frozen .. that is what this patch is
> > > > trying to address, so we aren't racing between shutting down the hw
> > > > and the scheduler shoveling more jobs at us.
> > > Well exactly that's the problem. The scheduler is supposed to shoveling
> > > more jobs at us until it is empty.
> > > 
> > > Thinking more about it we will then keep some dma_fence instance
> > > unsignaled and that is and extremely bad idea since it can lead to
> > > deadlocks during suspend.
> > Hmm, perhaps that is true if you need to migrate things out of vram?
> > It is at least not a problem when vram is not involved.
> 
> No, it's much wider than that.
> 
> See what can happen is that the memory management shrinkers want to wait for
> a dma_fence during suspend.
> 
> And if you stop the scheduler they will just wait forever.
> 
> What you need to do instead is to drain the scheduler, e.g. call
> drm_sched_entity_flush() with a proper timeout for each entity you have
> created.

Yeah I think properly flushing the scheduler and stopping it and cutting
all drivers over to that sounds like the right approach. Generally suspend
shouldn't be such a critical path that this will hurt us, all the other io
queues get flushed too afaik.

Resume is the thing that needs to go real fast.

So a patch set to move all drivers that open code the kthread_park to the
right scheduler function sounds like the right idea here to me.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > > So this patch here is an absolute clear NAK from my side. If amdgpu is
> > > doing something similar that is a severe bug and needs to be addressed
> > > somehow.
> > I think amdgpu's use of kthread_park is not related to suspend, but
> > didn't look too closely.
> > 
> > And perhaps the solution for this problem is more complex in the case
> > of amdgpu, I'm not super familiar with the constraints there.  But I
> > think it is a fine solution for integrated GPUs.
> > 
> > BR,
> > -R
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > BR,
> > > > -R
> > > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch