[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250804134006.3609555-1-wangzhaolong@huaweicloud.com>
Date: Mon, 4 Aug 2025 21:40:02 +0800
From: Wang Zhaolong <wangzhaolong@...weicloud.com>
To: sfrench@...ba.org,
pshilov@...rosoft.com
Cc: linux-cifs@...r.kernel.org,
samba-technical@...ts.samba.org,
linux-kernel@...r.kernel.org,
chengzhihao1@...wei.com,
wangzhaolong@...weicloud.com,
yi.zhang@...wei.com,
yangerkun@...wei.com
Subject: [PATCH 0/4] Fix mid_q_entry memory leaks in SMB client
I've been investigating a pretty nasty memory leak in the SMB client. When
compound requests get interrupted by signals, we end up with mid_q_entry
structures and server buffers that never get freed[1].
User foreground process cifsd
cifs_readdir
open_cached_dir
cifs_send_recv
compound_send_recv
smb2_setup_request
smb2_mid_entry_alloc
smb2_get_mid_entry
smb2_mid_entry_alloc
mempool_alloc // alloc mid
kref_init(&temp->refcount); // refcount = 1
mid[0]->callback = cifs_compound_callback;
mid[1]->callback = cifs_compound_last_callback;
smb_send_rqst
rc = wait_for_response
wait_event_state TASK_KILLABLE
cifs_demultiplex_thread
allocate_buffers
server->bigbuf = cifs_buf_get()
standard_receive3
->find_mid()
smb2_find_mid
__smb2_find_mid
kref_get(&mid->refcount) // +1
cifs_handle_standard
handle_mid
/* bigbuf will also leak */
mid->resp_buf = server->bigbuf
server->bigbuf = NULL;
dequeue_mid
/* in for loop */
mids[0]->callback
cifs_compound_callback
/* Signal interrupts wait: rc = -ERESTARTSYS */
/* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
midQ[0]->callback = cifs_cancelled_callback;
cancelled_mid[i] = true;
/* The change comes too late */
mid->mid_state = MID_RESPONSE_READY
release_mid // -1
/* cancelled_mid[i] == true causes mid won't be released
in compound_send_recv cleanup */
/* cifs_cancelled_callback won't executed to release mid */
The core issue is a race condition where cifs_cancelled_callback never
gets a chance to run, so cleanup never happens. I've spent quite a bit
of time trying to understand how to fix this safely.
Honestly, my first instinct was to just patch the callback assignment
logic directly. But the more I dug into it, the more I realized that
the current locking scheme makes this really tricky to do safely. We
have one big lock protecting multiple different things, and trying to
fix the race condition directly felt like playing with fire.
I kept running into scenarios where a "simple" fix could introduce
deadlocks or new race conditions. After looking at this from different
angles, I came to the conclusion that I needed to refactor the locking
first to create a safe foundation for the actual fix.
Patches 1-3 are foundational refactoring. These three patches rename
locks for clarity, separate counter protection from queue operations,
and replace the confusing mid_flags bitmask with explicit boolean
fields. Basically, they untangle the current locking mess so I can
implement the real fix without breaking anything.
The 4th patch in the series is where the real fix happens. With
the previous refactoring in place, I could safely add a lock to each
mid_q_entry and implement atomic callback execution. This eliminates
the race condition that was causing the leaks.
In summary, my approach to the fix is to use smaller-grained locking to
avoid race conditions. However, during the implementation process,
this approach involves more changes than I initially hoped for. If
there's a simpler or more elegant way to fix this race condition that
I've missed, I'd love to hear about it. I've tried to be thorough in
my analysis, but I know there are folks with more experience in this
codebase who might see a better path.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 [1]
Wang Zhaolong (4):
smb: client: rename server mid_lock to mid_queue_lock
smb: client: add mid_counter_lock to protect the mid counter counter
smb: client: smb: client: eliminate mid_flags field
smb: client: fix mid_q_entry memleak leak with per-mid locking
fs/smb/client/cifs_debug.c | 12 ++++--
fs/smb/client/cifsglob.h | 23 +++++++-----
fs/smb/client/connect.c | 57 +++++++++++++++++-----------
fs/smb/client/smb1ops.c | 23 ++++++++----
fs/smb/client/smb2ops.c | 69 ++++++++++++++++++----------------
fs/smb/client/smb2transport.c | 5 ++-
fs/smb/client/transport.c | 71 +++++++++++++++++++----------------
7 files changed, 150 insertions(+), 110 deletions(-)
--
2.39.2
Powered by blists - more mailing lists