linux-kernel - Re: [PATCH] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200416104025.5a22c228@why>
Date:   Thu, 16 Apr 2020 10:40:25 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Jerome Brunet <jbrunet@...libre.com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Kevin Hilman <khilman@...libre.com>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>
Subject: Re: [PATCH] irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe
 lock order

On Tue, 14 Apr 2020 17:52:25 +0200
Jerome Brunet <jbrunet@...libre.com> wrote:

Hi Jerome,

> On Tue 14 Apr 2020 at 15:20, Marc Zyngier <maz@...nel.org> wrote:
> 
> > On Tue,  7 Apr 2020 15:46:58 +0100
> > Marc Zyngier <maz@...nel.org> wrote:
> >
> > +Jerome, Martin,
> >  
> >> Running a lockedp-enabled kernel on a vim3l board (Amlogic SM1)
> >> leads to the following splat:
> >> 
> >> [   13.557138] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> >> [   13.587485] ip/456 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> >> [   13.625922] ffff000059908cf0 (&irq_desc_lock_class){-.-.}-{2:2}, at: __setup_irq+0xf8/0x8d8
> >> [   13.632273] which would create a new lock dependency:
> >> [   13.637272]  (&irq_desc_lock_class){-.-.}-{2:2} -> (&ctl->lock){+.+.}-{2:2}
> >> [   13.644209]
> >> [   13.644209] but this new dependency connects a HARDIRQ-irq-safe lock:
> >> [   13.654122]  (&irq_desc_lock_class){-.-.}-{2:2}
> >> [   13.654125]
> >> [   13.654125] ... which became HARDIRQ-irq-safe at:
> >> [   13.664759]   lock_acquire+0xec/0x368
> >> [   13.666926]   _raw_spin_lock+0x60/0x88
> >> [   13.669979]   handle_fasteoi_irq+0x30/0x178
> >> [   13.674082]   generic_handle_irq+0x38/0x50
> >> [   13.678098]   __handle_domain_irq+0x6c/0xc8
> >> [   13.682209]   gic_handle_irq+0x5c/0xb0
> >> [   13.685872]   el1_irq+0xd0/0x180
> >> [   13.689010]   arch_cpu_idle+0x40/0x220
> >> [   13.692732]   default_idle_call+0x54/0x60
> >> [   13.696677]   do_idle+0x23c/0x2e8
> >> [   13.699903]   cpu_startup_entry+0x30/0x50
> >> [   13.703852]   rest_init+0x1e0/0x2b4
> >> [   13.707301]   arch_call_rest_init+0x18/0x24
> >> [   13.711449]   start_kernel+0x4ec/0x51c
> >> [   13.715167]
> >> [   13.715167] to a HARDIRQ-irq-unsafe lock:
> >> [   13.722426]  (&ctl->lock){+.+.}-{2:2}
> >> [   13.722430]
> >> [   13.722430] ... which became HARDIRQ-irq-unsafe at:
> >> [   13.732319] ...
> >> [   13.732324]   lock_acquire+0xec/0x368
> >> [   13.735985]   _raw_spin_lock+0x60/0x88
> >> [   13.739452]   meson_gpio_irq_domain_alloc+0xcc/0x290
> >> [   13.744392]   irq_domain_alloc_irqs_hierarchy+0x24/0x60
> >> [   13.749586]   __irq_domain_alloc_irqs+0x160/0x2f0
> >> [   13.754254]   irq_create_fwspec_mapping+0x118/0x320
> >> [   13.759073]   irq_create_of_mapping+0x78/0xa0
> >> [   13.763360]   of_irq_get+0x6c/0x80
> >> [   13.766701]   of_mdiobus_register_phy+0x10c/0x238 [of_mdio]
> >> [   13.772227]   of_mdiobus_register+0x158/0x380 [of_mdio]
> >> [   13.777388]   mdio_mux_init+0x180/0x2e8 [mdio_mux]
> >> [   13.782128]   g12a_mdio_mux_probe+0x290/0x398 [mdio_mux_meson_g12a]
> >> [   13.788349]   platform_drv_probe+0x5c/0xb0
> >> [   13.792379]   really_probe+0xe4/0x448
> >> [   13.795979]   driver_probe_device+0xe8/0x140
> >> [   13.800189]   __device_attach_driver+0x94/0x120
> >> [   13.804639]   bus_for_each_drv+0x84/0xd8
> >> [   13.808474]   __device_attach+0xe4/0x168
> >> [   13.812361]   device_initial_probe+0x1c/0x28
> >> [   13.816592]   bus_probe_device+0xa4/0xb0
> >> [   13.820430]   deferred_probe_work_func+0xa8/0x100
> >> [   13.825064]   process_one_work+0x264/0x688
> >> [   13.829088]   worker_thread+0x4c/0x458
> >> [   13.832768]   kthread+0x154/0x158
> >> [   13.836018]   ret_from_fork+0x10/0x18
> >> [   13.839612]
> >> [   13.839612] other info that might help us debug this:
> >> [   13.839612]
> >> [   13.850354]  Possible interrupt unsafe locking scenario:
> >> [   13.850354]
> >> [   13.855720]        CPU0                    CPU1
> >> [   13.858774]        ----                    ----
> >> [   13.863242]   lock(&ctl->lock);
> >> [   13.866330]                                local_irq_disable();
> >> [   13.872233]                                lock(&irq_desc_lock_class);
> >> [   13.878705]                                lock(&ctl->lock);
> >> [   13.884297]   <Interrupt>
> >> [   13.886857]     lock(&irq_desc_lock_class);
> >> [   13.891014]
> >> [   13.891014]  *** DEADLOCK ***
> >> 
> >> The issue can occur when CPU1 is doing something like irq_set_type()
> >> and CPU0 performing an interrupt allocation, for example. Taking
> >> an interrupt (like the one being reconfigured) would lead to a
> >> deadlock.  
> 
> Just to make sure I understand
> * the 1st trace is a CPU getting interrupted while setting the irq type
> * the 2nd trace is another CPU trying to allocate an irq for network PHY.

The traces are only what lockdep sees as a dangerous behaviour, not
necessarily what actually leads to a deadlock. The deadlock scenario is
the one outlined just before "*** DEADLOCK ***", and a way to get there
is my interpretation just above.

> >> 
> >> A solution to this is:
> >> 
> >> - Reorder the locking so that meson_gpio_irq_update_bits takes the lock
> >>   itself at all times, instead of relying on the caller to lock or not,
> >>   hence making the RMW sequence atomic,
> >> 
> >> - Rework the critical section in meson_gpio_irq_request_channel to only
> >>   cover the allocation itself, and let the gpio_irq_sel_pin callback
> >>   deal with its own locking if required,
> >> 
> >> - Take the private spin-lock with interrupts disabled at all times  
> 
> Looks like the only safe path if I understand correctly.
> The patch below looks good to me.
> 
> >> 
> >> Signed-off-by: Marc Zyngier <maz@...nel.org>  
> 
> Thanks for the fix Marc.
> 
> Reviewed-by: Jerome Brunet <jbrunet@...libre.com>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...