[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <af2895170223142a8dc824c7096d986da57aeb96.1748594841.git.libo.gcs85@bytedance.com>
Date: Fri, 30 May 2025 17:27:50 +0800
From: Bo Li <libo.gcs85@...edance.com>
To: tglx@...utronix.de,
mingo@...hat.com,
bp@...en8.de,
dave.hansen@...ux.intel.com,
x86@...nel.org,
luto@...nel.org,
kees@...nel.org,
akpm@...ux-foundation.org,
david@...hat.com,
juri.lelli@...hat.com,
vincent.guittot@...aro.org,
peterz@...radead.org
Cc: dietmar.eggemann@....com,
hpa@...or.com,
acme@...nel.org,
namhyung@...nel.org,
mark.rutland@....com,
alexander.shishkin@...ux.intel.com,
jolsa@...nel.org,
irogers@...gle.com,
adrian.hunter@...el.com,
kan.liang@...ux.intel.com,
viro@...iv.linux.org.uk,
brauner@...nel.org,
jack@...e.cz,
lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com,
vbabka@...e.cz,
rppt@...nel.org,
surenb@...gle.com,
mhocko@...e.com,
rostedt@...dmis.org,
bsegall@...gle.com,
mgorman@...e.de,
vschneid@...hat.com,
jannh@...gle.com,
pfalcato@...e.de,
riel@...riel.com,
harry.yoo@...cle.com,
linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org,
duanxiongchun@...edance.com,
yinhongbo@...edance.com,
dengliang.1214@...edance.com,
xieyongji@...edance.com,
chaiwen.cc@...edance.com,
songmuchun@...edance.com,
yuanzhu@...edance.com,
chengguozhu@...edance.com,
sunjiadong.lff@...edance.com,
Bo Li <libo.gcs85@...edance.com>
Subject: [RFC v2 22/35] RPAL: rebuild receiver state
When an RPAL call occurs, the sender modifies the receiver's state. If the
sender exits abnormally after modifying the state or encounters an
unhandled page fault and returns to a recovery point, the receiver's state
will remain as modified by the sender (e.g., in the CALL state). Since the
sender may have exited, the lazy switch will not occur, leaving the
receiver unrecoverable (unable to be woken up via try_to_wake_up()).
Therefore, the kernel must ensure the receiver's state remains valid in
these cases.
This patch addresses this by rebuild receiver's state during unhandled page
faults or sender exits. The kernel detect the fsbase value recorded by
the sender and use the fsbase value to locate the corresponding receiver.
Then kernel checking if the receiver is in the CALL state set by the
sender (using sender_id and service_id carried in the CALL state). If true,
transitioning the receiver from CALL to WAIT state and notifying the
receiver via sender_state that the RPAL call has completed.
This ensures that even if the sender fails, the receiver can recover and
resume normal operation by resetting its state and avoiding permanent
blocking.
Signed-off-by: Bo Li <libo.gcs85@...edance.com>
---
arch/x86/rpal/thread.c | 44 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/arch/x86/rpal/thread.c b/arch/x86/rpal/thread.c
index db3b13ff82be..02c1a9c22dd7 100644
--- a/arch/x86/rpal/thread.c
+++ b/arch/x86/rpal/thread.c
@@ -224,6 +224,45 @@ int rpal_unregister_receiver(void)
return ret;
}
+/* sender may corrupt receiver's state if unexpectedly exited, rebuild it */
+static void rpal_rebuild_receiver_context_on_exit(void)
+{
+ struct task_struct *receiver = NULL;
+ struct rpal_sender_data *rsd = current->rpal_sd;
+ struct rpal_sender_call_context *scc = rsd->scc;
+ struct rpal_receiver_data *rrd;
+ struct rpal_receiver_call_context *rcc;
+ unsigned long fsbase;
+ int state = rpal_build_call_state(rsd);
+
+ if (scc->ec.magic != RPAL_ERROR_MAGIC)
+ goto out;
+
+ fsbase = scc->ec.fsbase;
+ if (rpal_is_correct_address(rpal_current_service(), fsbase))
+ goto out;
+
+ receiver = rpal_find_next_task(fsbase);
+ if (!receiver)
+ goto out;
+
+ rrd = receiver->rpal_rd;
+ if (!rrd)
+ goto out;
+
+ rcc = rrd->rcc;
+
+ if (atomic_read(&rcc->receiver_state) == state) {
+ atomic_cmpxchg(&rcc->sender_state, RPAL_SENDER_STATE_CALL,
+ RPAL_SENDER_STATE_KERNEL_RET);
+ atomic_cmpxchg(&rcc->receiver_state, state,
+ RPAL_RECEIVER_STATE_WAIT);
+ }
+
+out:
+ return;
+}
+
int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
unsigned long addr, int error_code)
{
@@ -232,6 +271,7 @@ int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
unsigned long erip, ersp;
int magic;
+ rpal_rebuild_receiver_context_on_exit();
erip = scc->ec.erip;
ersp = scc->ec.ersp;
magic = scc->ec.magic;
@@ -249,8 +289,10 @@ int rpal_rebuild_sender_context_on_fault(struct pt_regs *regs,
void exit_rpal_thread(void)
{
- if (rpal_test_current_thread_flag(RPAL_SENDER_BIT))
+ if (rpal_test_current_thread_flag(RPAL_SENDER_BIT)) {
+ rpal_rebuild_receiver_context_on_exit();
rpal_unregister_sender();
+ }
if (rpal_test_current_thread_flag(RPAL_RECEIVER_BIT))
rpal_unregister_receiver();
--
2.20.1
Powered by blists - more mailing lists