[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gq3mt=CcYQLkNLzqZub26QxSM8cajAgUzNWsO-z3PW9g@mail.gmail.com>
Date: Tue, 17 Jul 2018 09:39:34 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Lukas Wunner <lukas@...ner.de>, Lyude Paul <lyude@...hat.com>
Cc: nouveau@...ts.freedesktop.org, David Airlie <airlied@...ux.ie>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
dri-devel <dri-devel@...ts.freedesktop.org>,
Ben Skeggs <bskeggs@...hat.com>,
Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [Nouveau] [PATCH 1/5] drm/nouveau: Prevent RPM callback recursion
in suspend/resume paths
On Tue, Jul 17, 2018 at 9:16 AM, Lukas Wunner <lukas@...ner.de> wrote:
> [cc += linux-pm]
>
> Hi Lyude,
>
> First of all, thanks a lot for looking into this.
>
> On Mon, Jul 16, 2018 at 07:59:25PM -0400, Lyude Paul wrote:
>> In order to fix all of the spots that need to have runtime PM get/puts()
>> added, we need to ensure that it's possible for us to call
>> pm_runtime_get/put() in any context, regardless of how deep, since
>> almost all of the spots that are currently missing refs can potentially
>> get called in the runtime suspend/resume path. Otherwise, we'll try to
>> resume the GPU as we're trying to resume the GPU (and vice-versa) and
>> cause the kernel to deadlock.
>>
>> With this, it should be safe to call the pm runtime functions in any
>> context in nouveau with one condition: any point in the driver that
>> calls pm_runtime_get*() cannot hold any locks owned by nouveau that
>> would be acquired anywhere inside nouveau_pmops_runtime_resume().
>> This includes modesetting locks, i2c bus locks, etc.
> [snip]
>> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> @@ -835,6 +835,8 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>> return -EBUSY;
>> }
>>
>> + dev->power.disable_depth++;
This is effectively equivalent to __pm_runtime_disable(dev, false)
except for the locking (which is necessary).
>> +
>
> I'm not sure if that variable is actually private to the PM core.
> Grepping through the tree I only find a single occurrence where it's
> accessed outside the PM core and that's in amdgpu. So this looks
> a little fishy TBH. It may make sense to cc such patches to linux-pm
> to get Rafael & other folks involved with the PM core to comment.
You are right, power.disable_depth is internal to the PM core.
Accessing it (and updating it in particular) directly from drivers is
not a good idea.
> Also, the disable_depth variable only exists if the kernel was
> compiled with CONFIG_PM enabled, but I can't find a "depends on PM"
> or something like that in nouveau's Kconfig. Actually, if PM is
> not selected, all the nouveau_pmops_*() functions should be #ifdef'ed
> away, but oddly there's no #ifdef CONFIG_PM anywhere in nouveau_drm.c.
>
> Anywayn, if I understand the commit message correctly, you're hitting a
> pm_runtime_get_sync() in a code path that itself is called during a
> pm_runtime_get_sync(). Could you include stack traces in the commit
> message? My gut feeling is that this patch masks a deeper issue,
> e.g. if the runtime_resume code path does in fact directly poll outputs,
> that would seem wrong. Runtime resume should merely make the card
> accessible, i.e. reinstate power if necessary, put into PCI_D0,
> restore registers, etc. Output polling should be scheduled
> asynchronously.
Right.
Thanks,
Rafael
Powered by blists - more mailing lists