lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <af2895170223142a8dc824c7096d986da57aeb96.1748594841.git.libo.gcs85@bytedance.com>
Date: Fri, 30 May 2025 17:27:50 +0800
From: Bo Li <libo.gcs85@...edance.com>
To: tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	x86@...nel.org,
	luto@...nel.org,
	kees@...nel.org,
	akpm@...ux-foundation.org,
	david@...hat.com,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	peterz@...radead.org
Cc: dietmar.eggemann@....com,
	hpa@...or.com,
	acme@...nel.org,
	namhyung@...nel.org,
	mark.rutland@....com,
	alexander.shishkin@...ux.intel.com,
	jolsa@...nel.org,
	irogers@...gle.com,
	adrian.hunter@...el.com,
	kan.liang@...ux.intel.com,
	viro@...iv.linux.org.uk,
	brauner@...nel.org,
	jack@...e.cz,
	lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com,
	vbabka@...e.cz,
	rppt@...nel.org,
	surenb@...gle.com,
	mhocko@...e.com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	jannh@...gle.com,
	pfalcato@...e.de,
	riel@...riel.com,
	harry.yoo@...cle.com,
	linux-kernel@...r.kernel.org,
	linux-perf-users@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org,
	duanxiongchun@...edance.com,
	yinhongbo@...edance.com,
	dengliang.1214@...edance.com,
	xieyongji@...edance.com,
	chaiwen.cc@...edance.com,
	songmuchun@...edance.com,
	yuanzhu@...edance.com,
	chengguozhu@...edance.com,
	sunjiadong.lff@...edance.com,
	Bo Li <libo.gcs85@...edance.com>
Subject: [RFC v2 22/35] RPAL: rebuild receiver state

When an RPAL call occurs, the sender modifies the receiver's state. If the
sender exits abnormally after modifying the state or encounters an
unhandled page fault and returns to a recovery point, the receiver's state
will remain as modified by the sender (e.g., in the CALL state). Since the
sender may have exited, the lazy switch will not occur, leaving the
receiver unrecoverable (unable to be woken up via try_to_wake_up()).
Therefore, the kernel must ensure the receiver's state remains valid in
these cases.

This patch addresses this by rebuild receiver's state during unhandled page
faults or sender exits. The kernel detect the fsbase value recorded by
the sender and use the fsbase value to locate the corresponding receiver.
Then kernel checking if the receiver is in the CALL state set by the
sender (using sender_id and service_id carried in the CALL state). If true,
transitioning the receiver from CALL to WAIT state and notifying the
receiver via sender_state that the RPAL call has completed.

This ensures that even if the sender fails, the receiver can recover and
resume normal operation by resetting its state and avoiding permanent
blocking.

Signed-off-by: Bo Li <libo.gcs85@...edance.com>
---
 arch/x86/rpal/thread.c | 44 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/x86/rpal/thread.c b/arch/x86/rpal/thread.c
index db3b13ff82be..02c1a9c22dd7 100644
--- a/arch/x86/rpal/thread.c
+++ b/arch/x86/rpal/thread.c
@@ -224,6 +224,45 @@ int rpal_unregister_receiver(void)
 	return ret;
 }
 
+/* sender may corrupt receiver's state if unexpectedly exited, rebuild it */
+static void rpal_rebuild_receiver_context_on_exit(void)
+{
+	struct task_struct *receiver = NULL;
+	struct rpal_sender_data *rsd = current->rpal_sd;
+	struct rpal_sender_call_context *scc = rsd->scc;
+	struct rpal_receiver_data *rrd;
+	struct rpal_receiver_call_context *rcc;
+	unsigned long fsbase;
+	int state = rpal_build_call_state(rsd);
+
+	if (scc->ec.magic != RPAL_ERROR_MAGIC)
+		goto out;
+
+	fsbase = scc->ec.fsbase;
+	if (rpal_is_correct_address(rpal_current_service(), fsbase))
+		goto out;
+
+	receiver = rpal_find_next_task(fsbase);
+	if (!receiver)
+		goto out;
+
+	rrd = receiver->rpal_rd;
+	if (!rrd)
+		goto out;
+
+	rcc = rrd->rcc;
+
+	if (atomic_read(&rcc->receiver_state) == state) {
+		atomic_cmpxchg(&rcc->sender_state, RPAL_SENDER_STATE_CALL,
+			       RPAL_SENDER_STATE_KERNEL_RET);
+		atomic_cmpxchg(&rcc->receiver_state, state,
+			       RPAL_RECEIVER_STATE_WAIT);
+	}
+
+out:
+	return;
+}
+
 int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
 					 unsigned long addr, int error_code)
 {
@@ -232,6 +271,7 @@ int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
 		unsigned long erip, ersp;
 		int magic;
 
+		rpal_rebuild_receiver_context_on_exit();
 		erip = scc->ec.erip;
 		ersp = scc->ec.ersp;
 		magic = scc->ec.magic;
@@ -249,8 +289,10 @@ int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
 
 void exit_rpal_thread(void)
 {
-	if (rpal_test_current_thread_flag(RPAL_SENDER_BIT))
+	if (rpal_test_current_thread_flag(RPAL_SENDER_BIT)) {
+		rpal_rebuild_receiver_context_on_exit();
 		rpal_unregister_sender();
+	}
 
 	if (rpal_test_current_thread_flag(RPAL_RECEIVER_BIT))
 		rpal_unregister_receiver();
-- 
2.20.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ