[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h77iw34u.fsf@email.froward.int.ebiederm.org>
Date: Mon, 28 Mar 2022 09:41:37 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Oleg Nesterov <oleg@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>,
Ben Segall <bsegall@...gle.com>,
Borislav Petkov <bp@...en8.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] signal/x86: Delay calling signals in atomic
"Eric W. Biederman" <ebiederm@...ssion.com> writes:
> Sebastian Andrzej Siewior <bigeasy@...utronix.de> writes:
>
>> From: Oleg Nesterov <oleg@...hat.com>
>> Date: Tue, 14 Jul 2015 14:26:34 +0200
>>
>> On x86_64 we must disable preemption before we enable interrupts
>> for stack faults, int3 and debugging, because the current task is using
>> a per CPU debug stack defined by the IST. If we schedule out, another task
>> can come in and use the same stack and cause the stack to be corrupted
>> and crash the kernel on return.
>>
>> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
>> one of these is the spin lock used in signal handling.
>>
>> Some of the debug code (int3) causes do_trap() to send a signal.
>> This function calls a spinlock_t lock that has been converted to a
>> sleeping lock. If this happens, the above issues with the corrupted
>> stack is possible.
>>
>> Instead of calling the signal right away, for PREEMPT_RT and x86,
>> the signal information is stored on the stacks task_struct and
>> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
>> code will send the signal when preemption is enabled.
>
> Folks I really would have appreciated being copied on a signal handling
> patch like this.
>
> It is too late to nack, but I think this buggy patch deserved one. Can
> we please fix PREEMPT_RT instead?
>
> As far as I can tell this violates all of rules from
> implementing/maintaining the RT kernel. Instead of coming up with new
> abstractions that makes sense and can use by everyone this introduces
> a hack only for PREEMPT_RT and a pretty horrible one at that.
>
> This talks about int3, but the code looks for in_atomic(). Which means
> that essentially every call of force_sig will take this path as they
> almost all come from exception handlers. It is the nature of signals
> that report on faults. An exception is raised and the kernel reports it
> to userspace with a fault signal (aka force_sig_xxx).
>
> Further this code is buggy. TIF_NOTIFY_RESUME is not the correct
> flag to set to enter into exit_to_usermode_loop. TIF_NOTIFY_RESUME is
> about that happens after signal handling. This very much needs to be
> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
> "task->force_info".
>
> Does someone own this problem? Can that person please fix this
> properly?
>
> I really don't think it is going to be maintainable for PREEMPT_RT to
> maintain a separate signal delivery path for faults from the rest of
> linux.
I want to say the patch below looks like it was a perfectly fine debug
patch to see if what someone thinks is the issue is the issue. It is
not a good final solution for the reasons I have already mentioned.
May I ask where the rest of the conversation was? I can only find the
single posting of this patch on linux-kernel without any conversation,
and the description indicates this change has seen several rounds of
development.
Eric
>> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>> ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
>> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
>>
>> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
>> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
>> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
>> ---
>> arch/x86/include/asm/signal.h | 13 +++++++++++++
>> include/linux/sched.h | 3 +++
>> kernel/entry/common.c | 9 +++++++++
>> kernel/signal.c | 28 ++++++++++++++++++++++++++++
>> 4 files changed, 53 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
>> index 2dfb5fea13aff..fc03f4f7ed84c 100644
>> --- a/arch/x86/include/asm/signal.h
>> +++ b/arch/x86/include/asm/signal.h
>> @@ -28,6 +28,19 @@ typedef struct {
>> #define SA_IA32_ABI 0x02000000u
>> #define SA_X32_ABI 0x01000000u
>>
>> +/*
>> + * Because some traps use the IST stack, we must keep preemption
>> + * disabled while calling do_trap(), but do_trap() may call
>> + * force_sig_info() which will grab the signal spin_locks for the
>> + * task, which in PREEMPT_RT are mutexes. By defining
>> + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
>> + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
>> + * trap.
>> + */
>> +#if defined(CONFIG_PREEMPT_RT)
>> +#define ARCH_RT_DELAYS_SIGNAL_SEND
>> +#endif
>> +
>> #ifndef CONFIG_COMPAT
>> #define compat_sigset_t compat_sigset_t
>> typedef sigset_t compat_sigset_t;
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 75ba8aa60248b..0514237cee3fc 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1087,6 +1087,9 @@ struct task_struct {
>> /* Restored if set_restore_sigmask() was used: */
>> sigset_t saved_sigmask;
>> struct sigpending pending;
>> +#ifdef CONFIG_PREEMPT_RT
>> + struct kernel_siginfo forced_info;
>> +#endif
>> unsigned long sas_ss_sp;
>> size_t sas_ss_size;
>> unsigned int sas_ss_flags;
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index bad713684c2e3..216dbf46e05f5 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>> if (ti_work & _TIF_NEED_RESCHED)
>> schedule();
>>
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> + if (unlikely(current->forced_info.si_signo)) {
>> + struct task_struct *t = current;
>> +
>> + force_sig_info(&t->forced_info);
>> + t->forced_info.si_signo = 0;
>> + }
>> +#endif
>> +
>> if (ti_work & _TIF_UPROBE)
>> uprobe_notify_resume(regs);
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9b04631acde8f..cb2b28c17c0a5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>> struct k_sigaction *action;
>> int sig = info->si_signo;
>>
>> + /*
>> + * On some archs, PREEMPT_RT has to delay sending a signal from a trap
>> + * since it can not enable preemption, and the signal code's spin_locks
>> + * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
>> + * send the signal on exit of the trap.
>> + */
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> + if (in_atomic()) {
>> + struct task_struct *t = current;
>> +
>> + if (WARN_ON_ONCE(t->forced_info.si_signo))
>> + return 0;
>> +
>> + if (is_si_special(info)) {
>> + WARN_ON_ONCE(info != SEND_SIG_PRIV);
>> + t->forced_info.si_signo = info->si_signo;
>> + t->forced_info.si_errno = 0;
>> + t->forced_info.si_code = SI_KERNEL;
>> + t->forced_info.si_pid = 0;
>> + t->forced_info.si_uid = 0;
>> + } else {
>> + t->forced_info = *info;
>> + }
>> +
>> + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
>> + return 0;
>> + }
>> +#endif
>> spin_lock_irqsave(&t->sighand->siglock, flags);
>> action = &t->sighand->action[sig-1];
>> ignored = action->sa.sa_handler == SIG_IGN;
Powered by blists - more mailing lists