lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 28 Mar 2022 09:41:37 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        Oleg Nesterov <oleg@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Ben Segall <bsegall@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] signal/x86: Delay calling signals in atomic

"Eric W. Biederman" <ebiederm@...ssion.com> writes:

> Sebastian Andrzej Siewior <bigeasy@...utronix.de> writes:
>
>> From: Oleg Nesterov <oleg@...hat.com>
>> Date: Tue, 14 Jul 2015 14:26:34 +0200
>>
>> On x86_64 we must disable preemption before we enable interrupts
>> for stack faults, int3 and debugging, because the current task is using
>> a per CPU debug stack defined by the IST. If we schedule out, another task
>> can come in and use the same stack and cause the stack to be corrupted
>> and crash the kernel on return.
>>
>> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
>> one of these is the spin lock used in signal handling.
>>
>> Some of the debug code (int3) causes do_trap() to send a signal.
>> This function calls a spinlock_t lock that has been converted to a
>> sleeping lock. If this happens, the above issues with the corrupted
>> stack is possible.
>>
>> Instead of calling the signal right away, for PREEMPT_RT and x86,
>> the signal information is stored on the stacks task_struct and
>> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
>> code will send the signal when preemption is enabled.
>
> Folks I really would have appreciated being copied on a signal handling
> patch like this.
>
> It is too late to nack, but I think this buggy patch deserved one.  Can
> we please fix PREEMPT_RT instead?
>
> As far as I can tell this violates all of rules from
> implementing/maintaining the RT kernel.  Instead of coming up with new
> abstractions that makes sense and can use by everyone this introduces
> a hack only for PREEMPT_RT and a pretty horrible one at that.
>
> This talks about int3, but the code looks for in_atomic().  Which means
> that essentially every call of force_sig will take this path as they
> almost all come from exception handlers.  It is the nature of signals
> that report on faults.  An exception is raised and the kernel reports it
> to userspace with a fault signal (aka force_sig_xxx).
>
> Further this code is buggy.  TIF_NOTIFY_RESUME is not the correct
> flag to set to enter into exit_to_usermode_loop.  TIF_NOTIFY_RESUME is
> about that happens after signal handling.  This very much needs to be
> TIF_SIGPENDING with recalc_sigpending and friends updated to know about
> "task->force_info".
>
> Does someone own this problem?  Can that person please fix this
> properly?
>
> I really don't think it is going to be maintainable for PREEMPT_RT to
> maintain a separate signal delivery path for faults from the rest of
> linux.

I want to say the patch below looks like it was a perfectly fine debug
patch to see if what someone thinks is the issue is the issue.  It is
not a good final solution for the reasons I have already mentioned.

May I ask where the rest of the conversation was?  I can only find the
single posting of this patch on linux-kernel without any conversation,
and the description indicates this change has seen several rounds of
development.

Eric

>> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>>   ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
>> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
>>
>> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
>> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
>> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
>> ---
>>  arch/x86/include/asm/signal.h | 13 +++++++++++++
>>  include/linux/sched.h         |  3 +++
>>  kernel/entry/common.c         |  9 +++++++++
>>  kernel/signal.c               | 28 ++++++++++++++++++++++++++++
>>  4 files changed, 53 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
>> index 2dfb5fea13aff..fc03f4f7ed84c 100644
>> --- a/arch/x86/include/asm/signal.h
>> +++ b/arch/x86/include/asm/signal.h
>> @@ -28,6 +28,19 @@ typedef struct {
>>  #define SA_IA32_ABI	0x02000000u
>>  #define SA_X32_ABI	0x01000000u
>>  
>> +/*
>> + * Because some traps use the IST stack, we must keep preemption
>> + * disabled while calling do_trap(), but do_trap() may call
>> + * force_sig_info() which will grab the signal spin_locks for the
>> + * task, which in PREEMPT_RT are mutexes.  By defining
>> + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
>> + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
>> + * trap.
>> + */
>> +#if defined(CONFIG_PREEMPT_RT)
>> +#define ARCH_RT_DELAYS_SIGNAL_SEND
>> +#endif
>> +
>>  #ifndef CONFIG_COMPAT
>>  #define compat_sigset_t compat_sigset_t
>>  typedef sigset_t compat_sigset_t;
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 75ba8aa60248b..0514237cee3fc 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1087,6 +1087,9 @@ struct task_struct {
>>  	/* Restored if set_restore_sigmask() was used: */
>>  	sigset_t			saved_sigmask;
>>  	struct sigpending		pending;
>> +#ifdef CONFIG_PREEMPT_RT
>> +	struct				kernel_siginfo forced_info;
>> +#endif
>>  	unsigned long			sas_ss_sp;
>>  	size_t				sas_ss_size;
>>  	unsigned int			sas_ss_flags;
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index bad713684c2e3..216dbf46e05f5 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>>  		if (ti_work & _TIF_NEED_RESCHED)
>>  			schedule();
>>  
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> +		if (unlikely(current->forced_info.si_signo)) {
>> +			struct task_struct *t = current;
>> +
>> +			force_sig_info(&t->forced_info);
>> +			t->forced_info.si_signo = 0;
>> +		}
>> +#endif
>> +
>>  		if (ti_work & _TIF_UPROBE)
>>  			uprobe_notify_resume(regs);
>>  
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 9b04631acde8f..cb2b28c17c0a5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>>  	struct k_sigaction *action;
>>  	int sig = info->si_signo;
>>  
>> +	/*
>> +	 * On some archs, PREEMPT_RT has to delay sending a signal from a trap
>> +	 * since it can not enable preemption, and the signal code's spin_locks
>> +	 * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will
>> +	 * send the signal on exit of the trap.
>> +	 */
>> +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
>> +	if (in_atomic()) {
>> +		struct task_struct *t = current;
>> +
>> +		if (WARN_ON_ONCE(t->forced_info.si_signo))
>> +			return 0;
>> +
>> +		if (is_si_special(info)) {
>> +			WARN_ON_ONCE(info != SEND_SIG_PRIV);
>> +			t->forced_info.si_signo = info->si_signo;
>> +			t->forced_info.si_errno = 0;
>> +			t->forced_info.si_code = SI_KERNEL;
>> +			t->forced_info.si_pid = 0;
>> +			t->forced_info.si_uid = 0;
>> +		} else {
>> +			t->forced_info = *info;
>> +		}
>> +
>> +		set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
>> +		return 0;
>> +	}
>> +#endif
>>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>>  	action = &t->sighand->action[sig-1];
>>  	ignored = action->sa.sa_handler == SIG_IGN;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ