linux-kernel - Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200522132318.GM30374@kadam>
Date:   Fri, 22 May 2020 16:23:18 +0300
From:   Dan Carpenter <dan.carpenter@...cle.com>
To:     Thierry Reding <thierry.reding@...il.com>
Cc:     devel@...verdev.osuosl.org, Len Brown <len.brown@...el.com>,
        Pavel Machek <pavel@....cz>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Jonathan Hunter <jonathanh@...dia.com>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        dinghao.liu@....edu.cn, Kangjie Lu <kjlu@....edu>,
        Dmitry Osipenko <digetx@...il.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        linux-media@...r.kernel.org
Subject: Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance
 on error

On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@...cle.com> wrote:
> > > >
> > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@....edu.cn wrote:
> > > > > Hi, Dan,
> > > > >
> > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > counter on error. According to my static analysis results, the number of these
> > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > >
> > > >
> > > > That's why I was saying that we may need to introduce a new replacement
> > > > function for pm_runtime_get_sync() that works as expected.
> > > >
> > > > There is no reason why we have to live with the old behavior.
> > > 
> > > What exactly do you mean by "the old behavior"?
> > 
> > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > function which called pm_runtime_get_sync_resume() which does something
> > like this:
> > 
> > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > {
> > 	int ret;
> > 
> > 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > 	if (ret < 0) {
> > 		pm_runtime_put(dev);
> > 		return ret;
> > 	}
> > 	return 0;
> > }
> > 
> > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > thing is that this always returns zero on success.  I don't know that
> > drivers ever care to differentiate between one and zero returns.
> > 
> > Then if any of the caller expect that behavior we update them to use the
> > new function.
> 
> Does that really have many benefits, though? I understand that this
> would perhaps be easier to use because it is more in line with how other
> functions operate. On the other hand, in some cases you may want to call
> a different version of pm_runtime_put() on failure, as discussed in
> other threads.

I wasn't CC'd on the other threads so I don't know.  :/  I have always
assumed it was something like this but I don't know the details and
there is no documentation.

http://sweng.the-davies.net/Home/rustys-api-design-manifesto
You're essentially arguing that it's a #1 on Rusty's scale but ideally
we would want to be at #7.

> 
> Even ignoring that issue, any existing callsites that are leaking the
> reference would have to be updated to call the new function, which would
> be pretty much the same amount of work as updating the callsites to fix
> the leak, right?

With the current API we're constantly adding bugs.  I imagine that once
we add a straight forward default and some documentation then we will
solve this.

> 
> So if instead we just fix up the leaks, we might have a case of an API
> that doesn't work as some of us (myself included) expected it, but at
> least it would be consistent. If we add another variant things become
> fragmented and therefore even more complicated to use and review.

That's the approach that we've been trying and it's clearly not working.

regards,
dan carpenter