[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf16af3b-d00e-4084-aa29-6e4c1955ed88@gmail.com>
Date: Thu, 24 Oct 2024 13:34:11 +0200
From: Klara Modin <klarasmodin@...il.com>
To: Bartosz Golaszewski <brgl@...ev.pl>, Mark Brown <broonie@...nel.org>
Cc: Linus Walleij <linus.walleij@...aro.org>,
Kent Gibson <warthog618@...il.com>, linux-gpio@...r.kernel.org,
linux-kernel@...r.kernel.org,
Bartosz Golaszewski <bartosz.golaszewski@...aro.org>
Subject: Re: [PATCH v5 8/8] gpiolib: notify user-space about in-kernel line
state changes
Hi,
On 2024-10-24 10:15, Bartosz Golaszewski wrote:
> On Thu, 24 Oct 2024 08:49:30 +0200, Bartosz Golaszewski <brgl@...ev.pl> said:
>> On Wed, Oct 23, 2024 at 11:05 PM Mark Brown <broonie@...nel.org> wrote:
>>>
>>> On Fri, Oct 18, 2024 at 11:10:16AM +0200, Bartosz Golaszewski wrote:
>>>> From: Bartosz Golaszewski <bartosz.golaszewski@...aro.org>
>>>>
>>>> We currently only notify user-space about line config changes that are
>>>> made from user-space. Any kernel config changes are not signalled.
>>>>
>>>> Let's improve the situation by emitting the events closer to the source.
>>>> To that end let's call the relevant notifier chain from the functions
>>>> setting direction, gpiod_set_config(), gpiod_set_consumer_name() and
>>>> gpiod_toggle_active_low(). This covers all the options that we can
>>>> inform the user-space about. We ignore events which don't have
>>>> corresponding flags exported to user-space on purpose - otherwise the
>>>> user would see a config-changed event but the associated line-info would
>>>> remain unchanged.
>>>
>>> Today's -next is not booting on several of my platforms, including
>>> beaglebone-black, i.MX8MP-EVK and pine64plus. Bisects are pointing at
>>> this commit, and i.MX8MP-EVK is actually giving some console output:
>>>
>>> [ 2.502208] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>>> [ 2.511036] Mem abort info:
>>>
>>> ...
>>>
>>> [ 2.679934] Call trace:
>>> [ 2.682379] gpiod_direction_output+0x34/0x5c
>>> [ 2.686745] i2c_register_adapter+0x59c/0x670
>>> [ 2.691111] __i2c_add_numbered_adapter+0x58/0xa8
>>> [ 2.695822] i2c_add_adapter+0xa0/0xd0
>>> [ 2.699578] i2c_add_numbered_adapter+0x2c/0x38
>>> [ 2.704117] i2c_imx_probe+0x2d0/0x640
>>>
>>> which looks plausible given the change.
>>>
>>> Full boot log for i.MX8MP-EVK:
>>>
>>> https://lava.sirena.org.uk/scheduler/job/887083
>>>
>>> Bisect log for that, the others look similar (the long run of good/bad
>>> tags at the start for random commits is my automation pulling test
>>> results it already knows about in the affected range to try to speed up
>>> the bisect):
>>>
>>
>> Hi Mark!
>>
>> Any chance you could post the output of
>>
>> scripts/faddr2line drivers/gpio/gpiolib.o gpiod_direction_output+0x34/0x5c
>>
>> for that build?
>>
>> Bart
>>
>
> While I can't really reproduce it, I've looked at what could be wrong and
> figured that the NULL-pointer in question can possibly be the
> line_state_notifier.
>
> I realized that for some historical reasons we add the GPIO device to the
> global list before it's fully set up - including initializing the notifier.
> In some corner cases (devlinks borked?) this could lead to consumers requesting
> GPIOs before their provider is ready.
>
> Mark: could you try the following diff and let me know if it works?
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index ae758ba6dc3d..4258acac0162 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -987,45 +987,6 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
>
> gdev->ngpio = gc->ngpio;
> gdev->can_sleep = gc->can_sleep;
> -
> - scoped_guard(mutex, &gpio_devices_lock) {
> - /*
> - * TODO: this allocates a Linux GPIO number base in the global
> - * GPIO numberspace for this chip. In the long run we want to
> - * get *rid* of this numberspace and use only descriptors, but
> - * it may be a pipe dream. It will not happen before we get rid
> - * of the sysfs interface anyways.
> - */
> - base = gc->base;
> - if (base < 0) {
> - base = gpiochip_find_base_unlocked(gc->ngpio);
> - if (base < 0) {
> - ret = base;
> - base = 0;
> - goto err_free_label;
> - }
> -
> - /*
> - * TODO: it should not be necessary to reflect the
> - * assigned base outside of the GPIO subsystem. Go over
> - * drivers and see if anyone makes use of this, else
> - * drop this and assign a poison instead.
> - */
> - gc->base = base;
> - } else {
> - dev_warn(&gdev->dev,
> - "Static allocation of GPIO base is deprecated, use dynamic allocation.\n");
> - }
> -
> - gdev->base = base;
> -
> - ret = gpiodev_add_to_list_unlocked(gdev);
> - if (ret) {
> - chip_err(gc, "GPIO integer space overlap, cannot add chip\n");
> - goto err_free_label;
> - }
> - }
> -
> ATOMIC_INIT_NOTIFIER_HEAD(&gdev->line_state_notifier);
> BLOCKING_INIT_NOTIFIER_HEAD(&gdev->device_notifier);
>
> @@ -1103,6 +1064,45 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
> if (ret)
> goto err_remove_irqchip;
> }
> +
> + scoped_guard(mutex, &gpio_devices_lock) {
> + /*
> + * TODO: this allocates a Linux GPIO number base in the global
> + * GPIO numberspace for this chip. In the long run we want to
> + * get *rid* of this numberspace and use only descriptors, but
> + * it may be a pipe dream. It will not happen before we get rid
> + * of the sysfs interface anyways.
> + */
> + base = gc->base;
> + if (base < 0) {
> + base = gpiochip_find_base_unlocked(gc->ngpio);
> + if (base < 0) {
> + ret = base;
> + base = 0;
> + goto err_free_label;
> + }
> +
> + /*
> + * TODO: it should not be necessary to reflect the
> + * assigned base outside of the GPIO subsystem. Go over
> + * drivers and see if anyone makes use of this, else
> + * drop this and assign a poison instead.
> + */
> + gc->base = base;
> + } else {
> + dev_warn(&gdev->dev,
> + "Static allocation of GPIO base is deprecated, use dynamic allocation.\n");
> + }
> +
> + gdev->base = base;
> +
> + ret = gpiodev_add_to_list_unlocked(gdev);
> + if (ret) {
> + chip_err(gc, "GPIO integer space overlap, cannot add chip\n");
> + goto err_free_label;
> + }
> + }
> +
> return 0;
>
> err_remove_irqchip:
>
> Thanks
> Bartosz
>
I think I hit the same, or a similar bug, on my Edgerouter 6P (Cavium
Octeon III):
CPU 3 Unable to handle kernel paging request at virtual address
0000000000000000, epc == ffffffff817a40c8, ra == ffffffff817a40a4
Oops[#1]:
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.12.0-rc4-next-20241023-00032-g01c8e35f8d89 #84
...
Call Trace:
gpiod_direction_output (drivers/gpio/gpiolib.c:4164
drivers/gpio/gpiolib.c:2700 drivers/gpio/gpiolib.c:2694)
i2c_register_adapter (drivers/i2c/i2c-core-base.c:396
drivers/i2c/i2c-core-base.c:422 drivers/i2c/i2c-core-base.c:434
drivers/i2c/i2c-core-base.c:1574)
octeon_i2c_probe (drivers/i2c/busses/i2c-octeon-platdrv.c:247)
(the complete log is attached)
Unfortunately the oops remains after applying the diff and the call
trace looks to be the same.
Please let me know if there's anything else you need.
Regards,
Klara Modin
Download attachment "gpiod_oops_decoded.gz" of type "application/gzip" (6180 bytes)
View attachment "gpio_oops_bisect" of type "text/plain" (2616 bytes)
Download attachment "gpiod_oops_after_patch_decoded.gz" of type "application/gzip" (6203 bytes)
Powered by blists - more mailing lists