lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR08MB308979EC3A8A53AE6E2D3408802CA@AM0PR08MB3089.eurprd08.prod.outlook.com>
Date:   Fri, 14 Jul 2023 14:36:27 +0000
From:   Tim van der Staaij | Zign <Tim.vanderstaaij@...ngroup.com>
To:     Shawn Guo <shawnguo@...nel.org>,
        Sascha Hauer <s.hauer@...gutronix.de>
CC:     Vinod Koul <vkoul@...nel.org>,
        Pengutronix Kernel Team <kernel@...gutronix.de>,
        Fabio Estevam <festevam@...il.com>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: PROBLEM: deadlock when imx-sdma logs "restart cyclic channel"

Hello,

I use an i.MX6ULL UART with DMA enabled. Sometimes when a *lot* of data is received on this UART, the kernel hangs, printing this to the serial console:

[   83.339872] imx-sdma 20ec000.sdma: restart cyclic channel 5
[   83.340296] 
[   83.340307] ======================================================
[   83.340312] WARNING: possible circular locking dependency detected
[   83.340317] 5.19.14-yocto-standard #1 Not tainted
[   83.340326] ------------------------------------------------------
[   83.340329] ssh-keygen/236 is trying to acquire lock:
[   83.340336] c1718ad8 (console_owner){-...}-{0:0}, at: console_emit_next_record.constprop.0+0x120/0x344
[   83.340388] 
[   83.340388] but task is already holding lock:
[   83.340392] c391481c (&vc->lock){-.-.}-{2:2}, at: sdma_int_handler+0xb4/0x34c [imx_sdma]
[   83.340442] 
[   83.340442] which lock already depends on the new lock.
[   83.340442] 
[   83.340447] 
[   83.340447] the existing dependency chain (in reverse order) is:
[   83.340450] 
[   83.340450] -> #2 (&vc->lock){-.-.}-{2:2}:
[   83.340469]        sdma_prep_dma_cyclic+0x194/0x1f4 [imx_sdma]
[   83.340495]        imx_uart_startup+0x494/0x624
[   83.340508]        uart_port_startup+0x128/0x290
[   83.340519]        uart_port_activate+0x58/0xc0
[   83.340530]        tty_port_open+0x80/0xd0
[   83.340543]        uart_open+0x18/0x20
[   83.340557]        tty_open+0x128/0x678
[   83.340570]        chrdev_open+0xc0/0x214
[   83.340582]        do_dentry_open+0x1a8/0x3fc
[   83.340596]        path_openat+0xb48/0xe68
[   83.340606]        do_filp_open+0x5c/0xd4
[   83.340618]        do_sys_openat2+0xb8/0x188
[   83.340631]        sys_openat+0x8c/0xd8
[   83.340642]        ret_fast_syscall+0x0/0x1c
[   83.340655]        0xbedf9078
[   83.340662] 
[   83.340662] -> #1 (&port_lock_key){-.-.}-{2:2}:
[   83.340681]        imx_uart_console_write+0x16c/0x1a8
[   83.340692]        console_emit_next_record.constprop.0+0x18c/0x344
[   83.340707]        console_unlock+0x10c/0x270
[   83.340719]        register_console+0x154/0x2e8
[   83.340732]        uart_add_one_port+0x4fc/0x564
[   83.340745]        imx_uart_probe+0x4c0/0x71c
[   83.340755]        platform_probe+0x58/0xb0
[   83.340770]        really_probe.part.0+0x9c/0x2b0
[   83.340783]        __driver_probe_device+0xa0/0x138
[   83.340793]        driver_probe_device+0x30/0x10c
[   83.340805]        __driver_attach+0x8c/0x178
[   83.340815]        bus_for_each_dev+0x80/0xcc
[   83.340832]        bus_add_driver+0x154/0x1e8
[   83.340843]        driver_register+0x88/0x11c
[   83.340854]        imx_uart_init+0x20/0x40
[   83.340866]        do_one_initcall+0x7c/0x43c
[   83.340880]        kernel_init_freeable+0x17c/0x228
[   83.340892]        kernel_init+0x14/0x140
[   83.340908]        ret_from_fork+0x14/0x28
[   83.340918]        0x0
[   83.340925] 
[   83.340925] -> #0 (console_owner){-...}-{0:0}:
[   83.340946]        lock_acquire.part.0+0xb0/0x250
[   83.340959]        console_emit_next_record.constprop.0+0x168/0x344
[   83.340975]        console_unlock+0x10c/0x270
[   83.340987]        vprintk_emit+0x114/0x358
[   83.340999]        dev_vprintk_emit+0x114/0x150
[   83.341016]        dev_printk_emit+0x2c/0x5c
[   83.341030]        __dev_printk+0x4c/0x6c
[   83.341042]        _dev_warn+0x3c/0x70
[   83.341057]        sdma_int_handler+0x308/0x34c [imx_sdma]
[   83.341085]        __handle_irq_event_percpu+0x9c/0x338
[   83.341096]        handle_irq_event+0x38/0xd0
[   83.341107]        handle_fasteoi_irq+0x98/0x204
[   83.341120]        handle_irq_desc+0x1c/0x2c
[   83.341130]        gic_handle_irq+0x6c/0x90
[   83.341146]        generic_handle_arch_irq+0x2c/0x64
[   83.341159]        call_with_stack+0x18/0x20
[   83.341173]        __irq_usr+0x84/0xa0
[   83.341184]        0xb6d35e48
[   83.341191] 
[   83.341191] other info that might help us debug this:
[   83.341191] 
[   83.341196] Chain exists of:
[   83.341196]   console_owner --> &port_lock_key --> &vc->lock
[   83.341196] 
[   83.341221]  Possible unsafe locking scenario:
[   83.341221] 
[   83.341225]        CPU0                    CPU1
[   83.341228]        ----                    ----
[   83.341231]   lock(&vc->lock);
[   83.341239]                                lock(&port_lock_key);
[   83.341248]                                lock(&vc->lock);
[   83.341257]   lock(console_owner);
[   83.341266] 
[   83.341266]  *** DEADLOCK ***
[   83.341266] 
[   83.341270] 2 locks held by ssh-keygen/236:
[   83.341277]  #0: c391481c (&vc->lock){-.-.}-{2:2}, at: sdma_int_handler+0xb4/0x34c [imx_sdma]
[   83.341323]  #1: c17189d8 (console_lock){+.+.}-{0:0}, at: dev_vprintk_emit+0x114/0x150
[   83.341359] 
[   83.341359] stack backtrace:
[   83.341365] CPU: 0 PID: 236 Comm: ssh-keygen Not tainted 5.19.14-yocto-standard #1
[   83.341376] Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[   83.341387]  unwind_backtrace from show_stack+0x10/0x14
[   83.341407]  show_stack from dump_stack_lvl+0x60/0x90
[   83.341427]  dump_stack_lvl from check_noncircular+0x118/0x188
[   83.341446]  check_noncircular from __lock_acquire+0x1484/0x2b8c
[   83.341463]  __lock_acquire from lock_acquire.part.0+0xb0/0x250
[   83.341481]  lock_acquire.part.0 from console_emit_next_record.constprop.0+0x168/0x344
[   83.341503]  console_emit_next_record.constprop.0 from console_unlock+0x10c/0x270
[   83.341524]  console_unlock from vprintk_emit+0x114/0x358
[   83.341544]  vprintk_emit from dev_vprintk_emit+0x114/0x150
[   83.341567]  dev_vprintk_emit from dev_printk_emit+0x2c/0x5c
[   83.341590]  dev_printk_emit from __dev_printk+0x4c/0x6c
[   83.341610]  __dev_printk from _dev_warn+0x3c/0x70
[   83.341633]  _dev_warn from sdma_int_handler+0x308/0x34c [imx_sdma]
[   83.341672]  sdma_int_handler [imx_sdma] from __handle_irq_event_percpu+0x9c/0x338
[   83.341705]  __handle_irq_event_percpu from handle_irq_event+0x38/0xd0
[   83.341720]  handle_irq_event from handle_fasteoi_irq+0x98/0x204
[   83.341740]  handle_fasteoi_irq from handle_irq_desc+0x1c/0x2c
[   83.341755]  handle_irq_desc from gic_handle_irq+0x6c/0x90
[   83.341775]  gic_handle_irq from generic_handle_arch_irq+0x2c/0x64
[   83.341798]  generic_handle_arch_irq from call_with_stack+0x18/0x20
[   83.341821]  call_with_stack from __irq_usr+0x84/0xa0
[   83.341838] Exception stack(0xe0dd5fb0 to 0xe0dd5ff8)
[   83.341850] 5fa0:                                     00000000 00000007 015ee9dc 00000000
[   83.341864] 5fc0: 00000080 00000000 00000000 00000010 00000040 00000002 000000c0 37ea2aa2
[   83.341874] 5fe0: ffffffff bed21458 00000000 b6d35e48 60030030 ffffffff

I noticed that sdma_int_handler calls sdma_update_channel_loop while &sdmac->vc is locked. However, sdma_update_channel_loop can log "restart cyclic channel" in some situations, which will eventually attempt to acquire the lock of the console owner, which I suppose is the same lock.

Steps to reproduce:

Unfortunately I didn't find a way to consistently reproduce this issue. I can only say that I observed it multiple times in different boots, but only when large amounts of data are received.

# cat /proc/version
Linux version 5.19.14-yocto-standard (oe-user@...host) (arm-poky-linux-gnueabi-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0.20220819) #1 SMP Thu Oct 6 18:16:52 UTC 2022

Note: I didn't test on a newer kernel because the diff with this version shows no relevant changes.

# cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 5 (v7l)
BogoMIPS	: 64.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt fpd32 lpae 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xc07
CPU revision	: 5
Hardware	: Freescale i.MX6 Ultralite (Device Tree)

Relevant device tree additions:

/* Serial console UART */
&uart1 {
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_uart1>;
	status = "okay";
};

/* RS485 UART with DMA */
&uart5 {
	pinctrl-names = "default";
	pinctrl-0 = <&pinctrl_uart5>;
	dmas = <&sdma 33 4 1>, <&sdma 34 4 2>;
	dma-names = "rx", "tx";
	fsl,uart-has-rtscts;
	linux,rs485-enabled-at-boot-time;
	status = "okay";
};

Kind regards,

Tim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ