linux-kernel - Re: [PATCH v2] drm/etnaviv: fix ref count leak via pm_runtime_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAEkB2EQ9DuLt12MXkw3DYpBDaDAxoWTnMC9+=E7v=6dGGRK=nQ@mail.gmail.com>
Date:   Wed, 17 Jun 2020 12:57:09 -0500
From:   Navid Emamdoost <navid.emamdoost@...il.com>
To:     Lucas Stach <l.stach@...gutronix.de>
Cc:     Russell King <linux+etnaviv@...linux.org.uk>,
        Christian Gmeiner <christian.gmeiner@...il.com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>, etnaviv@...ts.freedesktop.org,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>, Kangjie Lu <kjlu@....edu>,
        mccamant@...umn.edu, Andy Shevchenko <andy.shevchenko@...il.com>,
        Navid Emamdoost <emamd001@....edu>,
        Qiushi Wu <wu000273@....edu>
Subject: Re: [PATCH v2] drm/etnaviv: fix ref count leak via pm_runtime_get_sync

Hi Lucas,


On Wed, Jun 17, 2020 at 4:53 AM Lucas Stach <l.stach@...gutronix.de> wrote:
>
> Hi Navid,
>
> Am Montag, den 15.06.2020, 01:12 -0500 schrieb Navid Emamdoost:
> > in etnaviv_gpu_submit, etnaviv_gpu_recover_hang, etnaviv_gpu_debugfs,
> > and etnaviv_gpu_init the call to pm_runtime_get_sync increments the
> > counter even in case of failure, leading to incorrect ref count.
> > In case of failure, decrement the ref count before returning.
>
> While that change is correct with the current API, may I ask the
> question why the way this API works is considered reasonable? A API
> call that fails, but still changes internal state and expects the
> caller to clean up the mess it not really what I would consider fool-
> proof API design. Is there a specific reason why it is done this way
> and not handled internally?

I share the same concern with you on the way this API is working now.
To the best of my knowledge, there are ongoing discussions on this
issue:

https://lkml.org/lkml/2020/6/14/76
https://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/

>
> Regards,
> Lucas
>
> > Signed-off-by: Navid Emamdoost <navid.emamdoost@...il.com>
> > ---
> > Changes in v2:
> >       - replace pm_runtime_put with  pm_runtime_put_noidle
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > index a31eeff2b297..7c9f3f9ba123 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > @@ -722,7 +722,7 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
> >       ret = pm_runtime_get_sync(gpu->dev);
> >       if (ret < 0) {
> >               dev_err(gpu->dev, "Failed to enable GPU power domain\n");
> > -             return ret;
> > +             goto pm_put;
> >       }
> >
> >       etnaviv_hw_identify(gpu);
> > @@ -819,6 +819,7 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
> >
> >  fail:
> >       pm_runtime_mark_last_busy(gpu->dev);
> > +pm_put:
> >       pm_runtime_put_autosuspend(gpu->dev);
> >
> >       return ret;
> > @@ -859,7 +860,7 @@ int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m)
> >
> >       ret = pm_runtime_get_sync(gpu->dev);
> >       if (ret < 0)
> > -             return ret;
> > +             goto pm_put;
> >
> >       dma_lo = gpu_read(gpu, VIVS_FE_DMA_LOW);
> >       dma_hi = gpu_read(gpu, VIVS_FE_DMA_HIGH);
> > @@ -1003,6 +1004,7 @@ int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m)
> >       ret = 0;
> >
> >       pm_runtime_mark_last_busy(gpu->dev);
> > +pm_put:
> >       pm_runtime_put_autosuspend(gpu->dev);
> >
> >       return ret;
> > @@ -1016,7 +1018,7 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
> >       dev_err(gpu->dev, "recover hung GPU!\n");
> >
> >       if (pm_runtime_get_sync(gpu->dev) < 0)
> > -             return;
> > +             goto pm_put;
> >
> >       mutex_lock(&gpu->lock);
> >
> > @@ -1035,6 +1037,7 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
> >
> >       mutex_unlock(&gpu->lock);
> >       pm_runtime_mark_last_busy(gpu->dev);
> > +pm_put:
> >       pm_runtime_put_autosuspend(gpu->dev);
> >  }
> >
> > @@ -1308,8 +1311,10 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit)
> >
> >       if (!submit->runtime_resumed) {
> >               ret = pm_runtime_get_sync(gpu->dev);
> > -             if (ret < 0)
> > +             if (ret < 0) {
> > +                     pm_runtime_put_noidle(gpu->dev);
> >                       return NULL;
> > +             }
> >               submit->runtime_resumed = true;
> >       }
> >
> > @@ -1326,6 +1331,7 @@ struct dma_fence *etnaviv_gpu_submit(struct etnaviv_gem_submit *submit)
> >       ret = event_alloc(gpu, nr_events, event);
> >       if (ret) {
> >               DRM_ERROR("no free events\n");
> > +             pm_runtime_put_noidle(gpu->dev);
> >               return NULL;
> >       }
> >
>


--
Navid.