[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+ZQh5AQo7UuAKUNdJWZMBjKi9VA8aNFnT21sq=3yyeGjg@mail.gmail.com>
Date: Fri, 6 Feb 2026 07:30:57 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: syzbot <syzbot+198c20fde37cb9f6b0ac@...kaller.appspotmail.com>
Cc: alibuda@...ux.alibaba.com, davem@...emloft.net, dust.li@...ux.alibaba.com,
edumazet@...gle.com, guwen@...ux.alibaba.com, horms@...nel.org,
kuba@...nel.org, linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
linux-s390@...r.kernel.org, mjambigi@...ux.ibm.com, netdev@...r.kernel.org,
pabeni@...hat.com, sidraya@...ux.ibm.com, syzkaller-bugs@...glegroups.com,
tonylu@...ux.alibaba.com, wenjia@...ux.ibm.com
Subject: Re: [syzbot] [smc?] KCSAN: data-race in smc_switch_to_fallback /
sock_poll (2)
On Fri, 6 Feb 2026 at 07:16, syzbot
<syzbot+198c20fde37cb9f6b0ac@...kaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 5fd0a1df5d05 Merge tag 'v6.19rc8-smb3-client-fixes' of git..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1070aa5a580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=8e27f4588a0f2183
> dashboard link: https://syzkaller.appspot.com/bug?extid=198c20fde37cb9f6b0ac
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/a09cd69509c3/disk-5fd0a1df.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/f218ec1eb157/vmlinux-5fd0a1df.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/8549229eee91/bzImage-5fd0a1df.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+198c20fde37cb9f6b0ac@...kaller.appspotmail.com
>
> ==================================================================
> BUG: KCSAN: data-race in smc_switch_to_fallback / sock_poll
>
> write to 0xffff888127398c18 of 8 bytes by task 14369 on cpu 1:
> smc_switch_to_fallback+0x4ea/0x7e0 net/smc/af_smc.c:933
> smc_sendmsg+0xce/0x2f0 net/smc/af_smc.c:2797
> sock_sendmsg_nosec net/socket.c:727 [inline]
> __sock_sendmsg net/socket.c:742 [inline]
> ____sys_sendmsg+0x5af/0x600 net/socket.c:2592
> ___sys_sendmsg+0x195/0x1e0 net/socket.c:2646
> __sys_sendmsg net/socket.c:2678 [inline]
> __do_sys_sendmsg net/socket.c:2683 [inline]
> __se_sys_sendmsg net/socket.c:2681 [inline]
> __x64_sys_sendmsg+0xd4/0x160 net/socket.c:2681
> x64_sys_call+0x17ba/0x3000 arch/x86/include/generated/asm/syscalls_64.h:47
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffff888127398c18 of 8 bytes by task 14367 on cpu 0:
> sock_poll+0x27/0x240 net/socket.c:1427
> vfs_poll include/linux/poll.h:82 [inline]
> __io_arm_poll_handler+0x1ee/0xb80 io_uring/poll.c:581
> io_poll_add+0x69/0xf0 io_uring/poll.c:899
> __io_issue_sqe+0xfd/0x2d0 io_uring/io_uring.c:1793
> io_issue_sqe+0x20b/0xc20 io_uring/io_uring.c:1816
> io_queue_sqe io_uring/io_uring.c:2043 [inline]
> io_submit_sqe io_uring/io_uring.c:2321 [inline]
> io_submit_sqes+0x78a/0x11b0 io_uring/io_uring.c:2435
> __do_sys_io_uring_enter io_uring/io_uring.c:3285 [inline]
> __se_sys_io_uring_enter+0x1bf/0x1c70 io_uring/io_uring.c:3224
> __x64_sys_io_uring_enter+0x78/0x90 io_uring/io_uring.c:3224
> x64_sys_call+0x27e4/0x3000 arch/x86/include/generated/asm/syscalls_64.h:427
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0xffff88811a6056c0 -> 0xffff88811a606080
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 UID: 0 PID: 14367 Comm: syz.8.3658 Not tainted syzkaller #0 PREEMPT(voluntary)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> ==================================================================
Here is what LLM said re harmfull-ness of this data race.
It does not look totally bogus to me. At least the read of
file->private_data in sock_poll() needs to be done with READ_ONCE to
avoid the harmful scenario. I don't know if changing the fundamental
socket function because of this it's the best solution, though.
========
The data race occurs on the `file->private_data` field of a socket
file descriptor. This field is being updated in
`smc_switch_to_fallback()` (to point to the underlying TCP/CLC socket
instead of the SMC socket) while concurrently being read in
`sock_poll()`.
### Analysis of the Race
1. **Nature of the Access**: `smc_switch_to_fallback()` is performing
a plain write to `file->private_data`, and `sock_poll()` is performing
a plain read. There is no mutual exclusion (like a lock) or memory
barrier protecting this transition.
2. **Type Confusion Risk**: In `sock_poll()`, the code first reads
`file->private_data` into a local variable `sock`, then reads
`sock->ops`, and finally calls `ops->poll(file, sock, wait)`. If the
compiler reloads `sock` from `file->private_data` between these steps
(which is permitted under the C memory model for non-volatile
accesses), it could fetch the `ops` from the SMC socket but then call
that `ops->poll` function (i.e., `smc_poll`) passing the TCP socket as
the `sock` argument.
3. **Consequences of Type Confusion**: `smc_poll()` casts the `struct
socket *sock` to a `struct smc_sock *`. A TCP socket (`struct
tcp_sock`) is not compatible with `struct smc_sock`. Accessing
SMC-specific fields (like `smc->use_fallback` or `smc->conn`) on a TCP
socket object would result in reading random memory, leading to
undefined behavior, logic errors, or a kernel crash.
4. **Inconsistent State**: Even if the compiler does not reload the
pointer, the race between setting `smc->use_fallback = true` and
updating `file->private_data` means that `sock_poll()` might see an
inconsistent state where it enters `smc_poll()` but the fallback is
already partially complete, potentially accessing uninitialized or
transitioning connection state.
5. **Violation of Invariants**: In the Linux kernel,
`file->private_data` for a socket is generally expected to be constant
for the lifetime of the `file` object. SMC's "fallback" mechanism
violates this invariant. While the mechanism is intended to be a
performance optimization, doing so without proper synchronization
(like `READ_ONCE`/`WRITE_ONCE` or a lock) makes it unsafe.
### Conclusion
This data race is **harmful** because it can lead to type confusion
and memory corruption. It is not a simple statistics counter or a
benign flag race; it involves the fundamental identity of the socket
object being operated on.
The fix for this would typically involve using `READ_ONCE` and
`WRITE_ONCE` to prevent compiler reloads and ensure atomicity, or
better yet, avoiding the mid-flight change of `file->private_data`
altogether.
Powered by blists - more mailing lists