[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMRc=Me--X58j+tLn50osPRXSsJeqms8m=aYDcwMwktfZvFQow@mail.gmail.com>
Date: Sun, 4 Dec 2022 21:46:50 +0100
From: Bartosz Golaszewski <brgl@...ev.pl>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Linus Walleij <linus.walleij@...aro.org>,
linux-gpio@...r.kernel.org, linux-kernel@...r.kernel.org,
Bartosz Golaszewski <bartosz.golaszewski@...aro.org>
Subject: Re: [GIT PULL] gpio: fixes for v6.1-rc8 - take 2
On Sun, Dec 4, 2022 at 9:17 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Sat, Dec 3, 2022 at 7:05 AM Bartosz Golaszewski <brgl@...ev.pl> wrote:
> >
> > Here's a fixed PR from the GPIO subsystem for the next rc.
>
> No, this cannot be right.
>
> That last commit seems *very* dubious, and in particular all those
>
> if (!down_read_trylock(&gdev->sem))
> return EPOLLHUP | EPOLLERR;
>
> are a sign that something is very very wrong there.
>
> Either the lock is necessary or it isn't, and "trylock" isn't the way
> to deal with it, with random failures if you cannot take the lock.
>
> If you are using "trylock" because the data structure might go away
> from under you, you have already lost, and the code is buggy.
>
> And if the data structure cannot go away from under you, you should
> do an unconditional lock, and then check "gdev->chip" for being NULL
> once you have gotten the lock (the same way you did in open()).
>
No, the data can't be removed with these locks in place. It's just to
avoid going into the callbacks if gpiochip_remove() is already in
progress (the only reason why trylock would fail).
> But a "trylock and return error if it failed" just means that now you
> are randomly returning errors to user space, which is entirely
> undebuggable and makes no sense.
>
Technically these are the same errors we'll return later if gdev->chip
is NULL but I get your point.
> Or, alternatively, the trylock succeeds - because it hits fully
> *after* gpiochip_remove() has finished, and now ->chip is NULL anyway,
> which is what you claim to protect against.
>
> End result: "trylock" can never be right in this kind of context.
>
> That "call_locked() helper might make sense more along the lines of
>
> ret = -ENODEV;
>
> down_read(&gdev->sem))
> // Does the device still exist?
> if (gdev->chip)
> ret = func(file, cmd, arg);
> up_read(&gdev->sem);
>
> return ret;
>
This is a good suggestion, thanks. And with it, the two patches can
get squashed into one for easier backporting.
> or similar. Not with that odd "try to lock, and if that fails, assume error".
>
> And again - if the trylock is there because 'gdev' itself might go
> away at any time and you can't afford to wait on the lock, then it's
> broken regardless (and the above suggestion won't help either)
>
> Anyway: the end result of this all is that I think this is a
> fundamental bug in the gpio layer, and rc7 (soon to be rc8) is too
> late to try these kinds of unfinished games.
>
> Fix it properly for 6.2, and make it back-portable, because I'm not
> pulling something like this right now.
>
> Linus
Will do. I will still resend the PR with only the resource leak fixes.
Bartosz
Powered by blists - more mailing lists