[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240718184311.3950526-1-leitao@debian.org>
Date: Thu, 18 Jul 2024 11:43:04 -0700
From: Breno Leitao <leitao@...ian.org>
To: kuba@...nel.org,
davem@...emloft.net,
edumazet@...gle.com,
pabeni@...hat.com
Cc: thepacketgeek@...il.com,
riel@...riel.com,
horms@...nel.org,
netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
paulmck@...nel.org,
davej@...emonkey.org.uk
Subject: [RFC PATCH 0/2] netconsole: Fix netconsole unsafe locking
Problem:
=======
The current locking mechanism in netconsole is unsafe and suboptimal due to the
following issues:
1) Lock Release and Reacquisition Mid-Loop:
In netconsole_netdev_event(), the target_list_lock is released and reacquired within a loop, potentially causing collisions and cleaning up targets that are being enabled.
int netconsole_netdev_event()
{
...
spin_lock_irqsave(&target_list_lock, flags);
list_for_each_entry(nt, &target_list, list) {
spin_unlock_irqrestore(&target_list_lock, flags);
__netpoll_cleanup(&nt->np);
spin_lock_irqsave(&target_list_lock, flags);
}
spin_lock_irqsave(&target_list_lock, flags);
...
}
2) Non-Atomic Cleanup Operations:
In enabled_store(), the cleanup of structures is not atomic, risking cleanup of structures that are in the process of being enabled.
size_t enabled_store()
{
...
spin_lock_irqsave(&target_list_lock, flags);
nt->enabled = false;
spin_unlock_irqrestore(&target_list_lock, flags);
netpoll_cleanup(&nt->np);
...
}
These issues stem from the following limitations in netconsole's locking design:
1) write_{ext_}msg() functions:
a) Cannot sleep
b) Must iterate through targets and send messages to all enabled entries.
c) List iteration is protected by target_list_lock spinlock.
2) Network event handling in netconsole_netdev_event():
a) Needs to sleep
a) Requires iteration over the target list (holding target_list_lock spinlock).
b) Some events necessitate netpoll struct cleanup, which *needs* to sleep.
The target_list_lock needs to be used by non-sleepable functions while also protecting operations that may sleep, leading to the current unsafe design.
Proposed Solution:
==================
1) Dual Locking Mechanism:
- Retain current target_list_lock for non-sleepable use cases.
- Introduce target_cleanup_list_lock (mutex) for sleepable operations.
2) Deferred Cleanup:
- Implement atomic, deferred cleanup of structures using the new mutex (target_cleanup_list_lock).
- Avoid the `goto` in the middle of the list_for_each_entry
3) Separate Cleanup List:
- Create target_cleanup_list for deferred cleanup, protected by target_cleanup_list_lock.
- This allows cleanup() to sleep without affecting message transmission.
- When iterating over targets, move devices needing cleanup to target_cleanup_list.
- Handle cleanup under the target_cleanup_list_lock mutex.
4) Make a clear locking hiearchy
- The target_cleanup_list_lock takes precedence over target_list_lock.
- Major Workflow Locking Sequences:
a) Network Event Affecting Netpoll (netconsole_netdev_event):
rtnl -> target_cleanup_list_lock -> target_list_lock
b) Message Writing (write_msg()):
console_lock -> target_list_lock
c) Configfs Target Enable/Disable (enabled_store()):
dynamic_netconsole_mutex -> target_cleanup_list_lock -> target_list_lock
This hierarchy ensures consistent lock acquisition order across different
operations, preventing deadlocks and maintaining proper synchronization. The
target_cleanup_list_lock's higher priority allows for safe deferred cleanup
operations without interfering with regular message transmission protected by
target_list_lock. Each workflow follows a specific locking sequence, ensuring
that operations like network event handling, message writing, and target
management are properly synchronized and do not conflict with each other.
Tested wostly with https://github.com/leitao/netcons/blob/main/basic_test.sh
Breno Leitao (2):
netpoll: extract core of netpoll_cleanup
netconsole: Defer netpoll cleanup to avoid lock release during list
traversal
drivers/net/netconsole.c | 77 +++++++++++++++++++++++++++++++---------
include/linux/netpoll.h | 1 +
net/core/netpoll.c | 12 +++++--
3 files changed, 71 insertions(+), 19 deletions(-)
--
2.43.0
Powered by blists - more mailing lists