lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMRc=MeNiTGWh2f4iivvR75F4TpmuNa-xExc4G7HssWS5Tch1w@mail.gmail.com>
Date: Tue, 13 Feb 2024 13:08:37 +0100
From: Bartosz Golaszewski <brgl@...ev.pl>
To: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Linus Walleij <linus.walleij@...aro.org>, Kent Gibson <warthog618@...il.com>, 
	Alex Elder <elder@...aro.org>, Geert Uytterhoeven <geert+renesas@...der.be>, 
	"Paul E . McKenney" <paulmck@...nel.org>, Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, 
	Wolfram Sang <wsa@...-dreams.de>, linux-gpio@...r.kernel.org, 
	linux-kernel@...r.kernel.org, 
	Bartosz Golaszewski <bartosz.golaszewski@...aro.org>
Subject: Re: [PATCH v3 00/24] gpio: rework locking and object life-time control

On Tue, Feb 13, 2024 at 1:05 PM Marek Szyprowski
<m.szyprowski@...sung.com> wrote:
>
> On 08.02.2024 10:58, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@...aro.org>
> >
> > This is a big rework of locking in GPIOLIB. The current serialization is
> > pretty much useless. There is one big spinlock (gpio_lock) that "protects"
> > both the GPIO device list, GPIO descriptor access and who knows what else.
> >
> > I'm putting "protects" in quotes as in several places the lock is
> > taken, released whenever a sleeping function is called and re-taken
> > without regards for the "protected" state that may have changed.
> >
> > First a little background on what we're dealing with in GPIOLIB. We have
> > consumer API functions that can be called from any context explicitly
> > (get/set value, set direction) as well as many others which will get
> > called in atomic context implicitly (e.g. set config called in certain
> > situations from gpiod_direction_output()).
> >
> > On the other side: we have GPIO provider drivers whose callbacks may or
> > may not sleep depending on the underlying protocol.
> >
> > This makes any attempts at serialization quite complex. We typically
> > cannot use sleeping locks - we may be called from atomic - but we also
> > often cannot use spinlocks - provider callbacks may sleep. Moreover: we
> > have close ties with the interrupt and pinctrl subsystems, often either
> > calling into them or getting called from them. They use their own locking
> > schemes which are at odds with ours (pinctrl uses mutexes, the interrupt
> > subsystem can call GPIO helpers with spinlock taken).
> >
> > There is also another significant issue: the GPIO device object contains
> > a pointer to gpio_chip which is the implementation of the GPIO provider.
> > This object can be removed at any point - as GPIOLIB officially supports
> > hotplugging with all the dynamic expanders that we provide drivers for -
> > and leave the GPIO API callbacks with a suddenly NULL pointer. This is
> > a problem that allowed user-space processes to easily crash the kernel
> > until we patched it with a read-write semaphore in the user-space facing
> > code (but the problem still exists for in-kernel users). This was
> > recognized before as evidenced by the implementation of validate_desc()
> > but without proper serialization, simple checking for a NULL pointer is
> > pointless and we do need a generic solution for that issue as well.
> >
> > If we want to get it right - the more lockless we go, the better. This is
> > why SRCU seems to be the right candidate for the mechanism to use. In fact
> > it's the only mechanism we can use our read-only critical sections to be
> > called from atomic and protecc contexts as well as call driver callbacks
> > that may sleep (for the latter case).
> >
> > We're going to use it in three places: to protect the global list of GPIO
> > devices, to ensure consistency when dereferencing the chip pointer in GPIO
> > device struct and finally to ensure that users can access GPIO descriptors
> > and always see a consistent state.
> >
> > We do NOT serialize all API callbacks. This means that provider callbacks
> > may be called simultaneously and GPIO drivers need to provide their own
> > locking if needed. This is on purpose. First: we only support exclusive
> > GPIO usage* so there's no risk of two drivers getting in each other's way
> > over the same GPIO. Second: with this series, we ensure enough consistency
> > to limit the chance of drivers or user-space users crashing the kernel.
> > With additional improvements in handling the flags field in GPIO
> > descriptors there's very little to gain, while bitbanging drivers may care
> > about the increased performance of going lockless.
> >
> > This series brings in one somewhat significant functional change for
> > in-kernel users, namely: GPIO API calls, for which the underlying GPIO
> > chip is gone, will no longer return 0 and emit a log message but instead
> > will return -ENODEV.
> >
> > I know this is a lot of code to go through but the more eyes we get on it
> > the better.
>
> I've noticed that this patchset landed in today's linux-next. It causes
> a lots of warning during boot on my test boards when LOCKDEP is enabled
> in kernel configs. Do you want me to report all of them? Some can be
> easily reproduced even with QEMU's virt ARM and ARM64 machines.
>

Marek,

Thanks for the report. I've already sent out patches that fix these
problems. Sorry for this.

Bartosz

[snip]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ