linux-kernel - Re: [PATCH] rtc: cmos: Disable irq around direct invocation of cmos

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YEIx2MetiHDXdrcL@intel.com>
Date:   Fri, 5 Mar 2021 15:27:52 +0200
From:   Ville Syrjälä <ville.syrjala@...ux.intel.com>
To:     Chris Wilson <chris@...is-wilson.co.uk>
Cc:     linux-rtc@...r.kernel.org, linux-kernel@...r.kernel.org,
        Xiaofei Tan <tanxiaofei@...wei.com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Alessandro Zummo <a.zummo@...ertech.it>
Subject: Re: [PATCH] rtc: cmos: Disable irq around direct invocation of
 cmos_interrupt()

On Fri, Mar 05, 2021 at 12:21:40PM +0000, Chris Wilson wrote:
> As previously noted in commit 66e4f4a9cc38 ("rtc: cmos: Use
> spin_lock_irqsave() in cmos_interrupt()"):
> 
> <4>[  254.192378] WARNING: inconsistent lock state
> <4>[  254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted
> <4>[  254.192396] --------------------------------
> <4>[  254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> <4>[  254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes:
> <4>[  254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100
> <4>[  254.192481] {IN-HARDIRQ-W} state was registered at:
> <4>[  254.192488]   lock_acquire+0xd1/0x3d0
> <4>[  254.192504]   _raw_spin_lock+0x2a/0x40
> <4>[  254.192519]   cmos_interrupt+0x18/0x100
> <4>[  254.192536]   rtc_handler+0x1f/0xc0
> <4>[  254.192553]   acpi_ev_fixed_event_detect+0x109/0x13c
> <4>[  254.192574]   acpi_ev_sci_xrupt_handler+0xb/0x28
> <4>[  254.192596]   acpi_irq+0x13/0x30
> <4>[  254.192620]   __handle_irq_event_percpu+0x43/0x2c0
> <4>[  254.192641]   handle_irq_event_percpu+0x2b/0x70
> <4>[  254.192661]   handle_irq_event+0x2f/0x50
> <4>[  254.192680]   handle_fasteoi_irq+0x9e/0x150
> <4>[  254.192693]   __common_interrupt+0x76/0x140
> <4>[  254.192715]   common_interrupt+0x96/0xc0
> <4>[  254.192732]   asm_common_interrupt+0x1e/0x40
> <4>[  254.192750]   _raw_spin_unlock_irqrestore+0x38/0x60
> <4>[  254.192767]   resume_irqs+0xba/0xf0
> <4>[  254.192786]   dpm_resume_noirq+0x245/0x3d0
> <4>[  254.192811]   suspend_devices_and_enter+0x230/0xaa0
> <4>[  254.192835]   pm_suspend.cold.8+0x301/0x34a
> <4>[  254.192859]   state_store+0x7b/0xe0
> <4>[  254.192879]   kernfs_fop_write_iter+0x11d/0x1c0
> <4>[  254.192899]   new_sync_write+0x11d/0x1b0
> <4>[  254.192916]   vfs_write+0x265/0x390
> <4>[  254.192933]   ksys_write+0x5a/0xd0
> <4>[  254.192949]   do_syscall_64+0x33/0x80
> <4>[  254.192965]   entry_SYSCALL_64_after_hwframe+0x44/0xae
> <4>[  254.192986] irq event stamp: 43775
> <4>[  254.192994] hardirqs last  enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
> <4>[  254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0
> <4>[  254.193049] softirqs last  enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
> <4>[  254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0
> <4>[  254.193101]
>                   other info that might help us debug this:
> <4>[  254.193107]  Possible unsafe locking scenario:
> 
> <4>[  254.193112]        CPU0
> <4>[  254.193117]        ----
> <4>[  254.193121]   lock(rtc_lock);
> <4>[  254.193137]   <Interrupt>
> <4>[  254.193142]     lock(rtc_lock);
> <4>[  254.193156]
>                    *** DEADLOCK ***
> 
> <4>[  254.193161] 6 locks held by rtcwake/5309:
> <4>[  254.193174]  #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0
> <4>[  254.193232]  #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0
> <4>[  254.193282]  #2: ffff888100cef3c0 (kn->active#285
> <7>[  254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled
> <4>[  254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0
> <4>[  254.193333]  #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a
> <4>[  254.193387]  #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70
> <4>[  254.193433]  #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0
> <4>[  254.193485]
>                   stack backtrace:
> <4>[  254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1
> <4>[  254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
> <4>[  254.193524] Call Trace:
> <4>[  254.193536]  dump_stack+0x7f/0xad
> <4>[  254.193567]  mark_lock.part.47+0x8ca/0xce0
> <4>[  254.193604]  __lock_acquire+0x39b/0x2590
> <4>[  254.193626]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> <4>[  254.193660]  lock_acquire+0xd1/0x3d0
> <4>[  254.193677]  ? cmos_interrupt+0x18/0x100
> <4>[  254.193716]  _raw_spin_lock+0x2a/0x40
> <4>[  254.193735]  ? cmos_interrupt+0x18/0x100
> <4>[  254.193758]  cmos_interrupt+0x18/0x100
> <4>[  254.193785]  cmos_resume+0x2ac/0x2d0
> <4>[  254.193813]  ? acpi_pm_set_device_wakeup+0x1f/0x110
> <4>[  254.193842]  ? pnp_bus_suspend+0x10/0x10
> <4>[  254.193864]  pnp_bus_resume+0x5e/0x90
> <4>[  254.193885]  dpm_run_callback+0x5f/0x240
> <4>[  254.193914]  device_resume+0xb2/0x1e0
> <4>[  254.193942]  ? pm_dev_err+0x25/0x25
> <4>[  254.193974]  dpm_resume+0xea/0x3f0
> <4>[  254.194005]  dpm_resume_end+0x8/0x10
> <4>[  254.194030]  suspend_devices_and_enter+0x29b/0xaa0
> <4>[  254.194066]  pm_suspend.cold.8+0x301/0x34a
> <4>[  254.194094]  state_store+0x7b/0xe0
> <4>[  254.194124]  kernfs_fop_write_iter+0x11d/0x1c0
> <4>[  254.194151]  new_sync_write+0x11d/0x1b0
> <4>[  254.194183]  vfs_write+0x265/0x390
> <4>[  254.194207]  ksys_write+0x5a/0xd0
> <4>[  254.194232]  do_syscall_64+0x33/0x80
> <4>[  254.194251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> <4>[  254.194274] RIP: 0033:0x7f07d79691e7
> <4>[  254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> <4>[  254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> <4>[  254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7
> <4>[  254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b
> <4>[  254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004
> <4>[  254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004
> 
> which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
> to trigger the alarm during the suspend.
> 
> Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")

Sigh. I wish people would at least try to check the code/history
before doing these blind "cleanups" :(

> References: 66e4f4a9cc38 ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
> Signed-off-by: Chris Wilson <chris@...is-wilson.co.uk>
> Cc: Xiaofei Tan <tanxiaofei@...wei.com>
> Cc: Alexandre Belloni <alexandre.belloni@...tlin.com>
> Cc: Alessandro Zummo <a.zummo@...ertech.it>
> Cc: Ville Syrjälä <ville.syrjala@...ux.intel.com>
> ---
>  drivers/rtc/rtc-cmos.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 670fd8a2970e..6545afb2f20e 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -1053,7 +1053,9 @@ static void cmos_check_wkalrm(struct device *dev)
>  	 * ACK the rtc irq here
>  	 */
>  	if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) {
> +		local_irq_disable();
>  		cmos_interrupt(0, (void *)cmos->rtc);
> +		local_irq_enable();

Yeah, given what's already happened this seems more likely to
survive a bit longer.

Reviewed-by: Ville Syrjälä <ville.syrjala@...ux.intel.com>

>  		return;
>  	}
>  
> -- 
> 2.20.1

-- 
Ville Syrjälä
Intel