netdev - Fw: [Bug 220544] New: AF_SMC deadlock: held by __sock_release, smc_release, and __flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250907090107.44a3f68e@hermes.local>
Date: Sun, 7 Sep 2025 09:01:07 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: alibuda@...ux.alibaba.com, dust.li@...ux.alibaba.com,
 sidraya@...ux.ibm.com, wenjia@...ux.ibm.com, mjambigi@...ux.ibm.com,
 tonylu@...ux.alibaba.com, guwen@...ux.alibaba.com
Cc: linux-rdma@...r.kernel.org, linux-s390@...r.kernel.org,
 netdev@...r.kernel.org
Subject: Fw: [Bug 220544] New: AF_SMC deadlock: held by __sock_release,
 smc_release, and __flush_work



Begin forwarded message:

Date: Sun, 07 Sep 2025 03:42:22 +0000
From: bugzilla-daemon@...nel.org
To: stephen@...workplumber.org
Subject: [Bug 220544] New: AF_SMC deadlock: held by __sock_release, smc_release, and __flush_work


https://bugzilla.kernel.org/show_bug.cgi?id=220544

            Bug ID: 220544
           Summary: AF_SMC deadlock: held by __sock_release, smc_release,
                    and __flush_work
           Product: Networking
           Version: 2.5
    Kernel Version: 6.12.x
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Other
          Assignee: stephen@...workplumber.org
          Reporter: hi@...rdim.xyz
        Regression: No

Created attachment 308627
  --> https://bugzilla.kernel.org/attachment.cgi?id=308627&action=edit  
crash full log and program source code

[ 2499.781797] 
[ 2499.782400] ======================================================
[ 2499.784129] WARNING: possible circular locking dependency detected
[ 2499.785824] 6.12.42 #1 Not tainted
[ 2499.786843] ------------------------------------------------------
[ 2499.788589] 1296/22742 is trying to acquire lock:
[ 2499.789941] ffff88801776ec18
((work_completion)(&new_smc->smc_listen_work)){+.+.}-{0:0}, at:
__flush_work+0x514/0xd50
[ 2499.793080] 
[ 2499.793080] but task is already holding lock:
[ 2499.794731] ffff888017768e98 (sk_lock-AF_SMC/1){+.+.}-{0:0}, at:
smc_release+0x376/0x600
[ 2499.797004] 
[ 2499.797004] which lock already depends on the new lock.
[ 2499.797004] 
[ 2499.799295] 
[ 2499.799295] the existing dependency chain (in reverse order) is:
[ 2499.801365] 
[ 2499.801365] -> #1 (sk_lock-AF_SMC/1){+.+.}-{0:0}:
[ 2499.803149]        lock_sock_nested+0x3a/0x100
[ 2499.804427]        smc_listen_out+0x1ea/0x4c0
[ 2499.805686]        smc_listen_work+0x4d1/0x5520
[ 2499.806987]        process_one_work+0x94a/0x1740
[ 2499.808415]        worker_thread+0x5c4/0xe10
[ 2499.809650]        kthread+0x2ad/0x360
[ 2499.810763]        ret_from_fork+0x4e/0x80
[ 2499.811966]        ret_from_fork_asm+0x1a/0x30
[ 2499.813324] 
[ 2499.813324] -> #0
((work_completion)(&new_smc->smc_listen_work)){+.+.}-{0:0}:
[ 2499.815691]        __lock_acquire+0x2413/0x4310
[ 2499.816983]        lock_acquire.part.0+0xff/0x350
[ 2499.818259]        __flush_work+0x528/0xd50
[ 2499.819376]        __cancel_work_sync+0x105/0x130
[ 2499.820689]        smc_clcsock_release+0x61/0xf0
[ 2499.821958]        __smc_release+0x5c9/0x8a0
[ 2499.823163]        smc_close_non_accepted+0xd7/0x210
[ 2499.824602]        smc_close_active+0x535/0x10e0
[ 2499.825867]        __smc_release+0x643/0x8a0
[ 2499.827067]        smc_release+0x1f0/0x600
[ 2499.828197]        __sock_release+0xac/0x260
[ 2499.829427]        sock_close+0x1c/0x30
[ 2499.830506]        __fput+0x3f6/0xb40
[ 2499.831552]        __fput_sync+0x4a/0x60
[ 2499.832651]        __x64_sys_close+0x86/0x100
[ 2499.833855]        do_syscall_64+0xbb/0x1d0
[ 2499.835043]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 2499.836574] 
[ 2499.836574] other info that might help us debug this:
[ 2499.836574] 
[ 2499.838725]  Possible unsafe locking scenario:
[ 2499.838725] 
[ 2499.840491]        CPU0                    CPU1
[ 2499.841748]        ----                    ----
[ 2499.843036]   lock(sk_lock-AF_SMC/1);
[ 2499.844110]                               
lock((work_completion)(&new_smc->smc_listen_work));
[ 2499.846436]                                lock(sk_lock-AF_SMC/1);
[ 2499.848134]   lock((work_completion)(&new_smc->smc_listen_work));
[ 2499.849780] 
[ 2499.849780]  *** DEADLOCK ***
[ 2499.849780] 
[ 2499.851388] 3 locks held by 1296/22742:
[ 2499.852456]  #0: ffff88801ed58d88 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3},
at: __sock_release+0x81/0x260
[ 2499.855185]  #1: ffff888017768e98 (sk_lock-AF_SMC/1){+.+.}-{0:0}, at:
smc_release+0x376/0x600
[ 2499.857486]  #2: ffffffff86e9dc00 (rcu_read_lock){....}-{1:2}, at:
__flush_work+0xff/0xd50
[ 2499.859710] 
[ 2499.859710] stack backtrace:
[ 2499.860913] CPU: 0 UID: 0 PID: 22742 Comm: 1296 Not tainted 6.12.42 #1
[ 2499.860935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.15.0-1 04/01/2014
[ 2499.860944] Call Trace:
[ 2499.860951]  <TASK>
[ 2499.860959]  dump_stack_lvl+0xba/0x110
[ 2499.860983]  print_circular_bug.cold+0x1e8/0x27f
[ 2499.861041]  check_noncircular+0x30e/0x3c0
[ 2499.861064]  ? __pfx_check_noncircular+0x10/0x10
[ 2499.861084]  ? register_lock_class+0xb2/0x12e0
[ 2499.861112]  ? lockdep_lock+0xb5/0x1b0
[ 2499.861131]  ? __pfx_lockdep_lock+0x10/0x10
[ 2499.861151]  __lock_acquire+0x2413/0x4310
[ 2499.861177]  ? __pfx___lock_acquire+0x10/0x10
[ 2499.861199]  ? __pfx_mark_lock+0x10/0x10
[ 2499.861221]  ? __flush_work+0x514/0xd50
[ 2499.861240]  lock_acquire.part.0+0xff/0x350
[ 2499.861261]  ? __flush_work+0x514/0xd50
[ 2499.861280]  ? lock_release+0x209/0x7d0
[ 2499.861302]  ? __pfx_lock_acquire.part.0+0x10/0x10
[ 2499.861323]  ? __flush_work+0x514/0xd50
[ 2499.861342]  ? trace_lock_acquire+0x132/0x1c0
[ 2499.861360]  ? __flush_work+0x514/0xd50
[ 2499.861378]  ? lock_acquire+0x31/0xc0
[ 2499.861398]  ? __flush_work+0x514/0xd50
[ 2499.861418]  __flush_work+0x528/0xd50
[ 2499.861436]  ? __flush_work+0x514/0xd50
[ 2499.861456]  ? __pfx___flush_work+0x10/0x10
[ 2499.861475]  ? __pfx_sock_def_readable+0x10/0x10
[ 2499.861497]  ? trace_irq_disable.constprop.0+0xcd/0x110
[ 2499.861519]  ? __pfx_wq_barrier_func+0x10/0x10
[ 2499.861548]  ? __pfx___might_resched+0x10/0x10
[ 2499.861567]  ? __pfx_sock_def_readable+0x10/0x10
[ 2499.861587]  __cancel_work_sync+0x105/0x130
[ 2499.861609]  smc_clcsock_release+0x61/0xf0
[ 2499.861630]  ? __local_bh_enable_ip+0x9b/0x140
[ 2499.861646]  __smc_release+0x5c9/0x8a0
[ 2499.861665]  ? lockdep_hardirqs_on_prepare+0x201/0x400
[ 2499.861688]  ? __pfx_sock_def_readable+0x10/0x10
[ 2499.861708]  smc_close_non_accepted+0xd7/0x210
[ 2499.861730]  smc_close_active+0x535/0x10e0
[ 2499.861753]  __smc_release+0x643/0x8a0
[ 2499.861772]  ? lockdep_hardirqs_on_prepare+0x25c/0x400
[ 2499.861795]  smc_release+0x1f0/0x600
[ 2499.861814]  __sock_release+0xac/0x260
[ 2499.861840]  ? __pfx_sock_close+0x10/0x10
[ 2499.861864]  sock_close+0x1c/0x30
[ 2499.861886]  __fput+0x3f6/0xb40
[ 2499.861912]  __fput_sync+0x4a/0x60
[ 2499.861935]  __x64_sys_close+0x86/0x100
[ 2499.861950]  do_syscall_64+0xbb/0x1d0
[ 2499.861972]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 2499.861992] RIP: 0033:0x7f40854559a0
[ 2499.862033] Code: 0d 00 00 00 eb b2 e8 0f f8 01 00 66 2e 0f 1f 84 00 00 00
00 00 0f 1f 44 00 00 80 3d 41 1c 0e 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
[ 2499.862049] RSP: 002b:00007ffecd7bbad8 EFLAGS: 00000202 ORIG_RAX:
0000000000000003
[ 2499.862065] RAX: ffffffffffffffda RBX: 0000000000000005 RCX:
00007f40854559a0
[ 2499.862077] RDX: 0000000000000000 RSI: 000055df79c1fe38 RDI:
0000000000000005
[ 2499.862087] RBP: 0000000000000006 R08: 000000000000f800 R09:
0000000000000073
[ 2499.862098] R10: 0000000000000000 R11: 0000000000000202 R12:
00007ffecd7bbb80
[ 2499.862110] R13: 00007ffecd7bbdb8 R14: 000055df79c21dd8 R15:
0000000000000000
[ 2499.862128]  </TASK>


Crashes happened on 6.12.34 and 6.12.42.
Machine info:
QEMU X86_64
Linux version 6.12.42(gcc (GCC) 15.1.1 20250729, GNU ld (GNU Binutils) 2.45.0)
#1 SMP PREEMPT_DYNAMIC Tue Aug 19 21:04:29 EDT 2025
Command line: console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0
nokaslr
infiniband enabled through rxe

Programs and logs that trigger the bug are attached

Usage `cat crash.input | program`

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.