linux-kernel - Re: [RFC PATCH] getrusage: Use trylock when getting sighand lock.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240117204416.GB32526@redhat.com>
Date: Wed, 17 Jan 2024 21:44:17 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Dylan Hatch <dylanbhatch@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Kees Cook <keescook@...omium.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Ard Biesheuvel <ardb@...nel.org>,
	"Matthew Wilcox (Oracle)" <willy@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Vincent Whitchurch <vincent.whitchurch@...s.com>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Luis Chamberlain <mcgrof@...nel.org>,
	Mike Christie <michael.christie@...cle.com>,
	David Hildenbrand <david@...hat.com>,
	Catalin Marinas <catalin.marinas@....com>,
	Stefan Roesch <shr@...kernel.io>, Joey Gouly <joey.gouly@....com>,
	Josh Triplett <josh@...htriplett.org>, Helge Deller <deller@....de>,
	Ondrej Mosnacek <omosnace@...hat.com>,
	Florent Revest <revest@...omium.org>,
	Miguel Ojeda <ojeda@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] getrusage: Use trylock when getting sighand lock.

Heh ;)

getrusage() should not use ->siglock at all.

On my TODO list. I'll try to make a patch this week.

Oleg.

On 01/17, Dylan Hatch wrote:
>
> Processes with many threads run the risk of causing a hard lockup if
> too many threads are calling getrusage() at once. This is because a
> calling thread with RUSAGE_SELF spins on the sighand lock with irq
> disabled, and the critical section of getrusage scales linearly with the
> size of the process. All cpus may end up spinning on the sighand lock
> for a long time because another thread has the lock and is busy
> iterating over 250k+ threads.
>
> In order to mitigate this, periodically re-enable interrupts while
> waiting for the sighand lock. This approach lacks the FIFO fairness of a
> normal spinlock mechanism, but this effect is somewhat contained to
> different threads within the same process.
>
> -- Alternatives Considered --
>
> In an earlier version of the above approach, we added a cond_resched()
> call when disabling interrupts to prevent soft lockups. This solution
> turned out not to be workable on its own since getrusage() is called
> from a non-preemptible context in kernel/exit.c, but could possibly be
> adapted by having an alternate version of getrusage() that can be called
> from a preemptible context.
>
> Another alternative would be to have getruage() itself release the lock
> and enable interrupts periodically while iterating over large numbers of
> threads. However, this would be difficult to implement correctly, and
> the correctness/consistency of the data reported by getrusage() would be
> questionable.
>
> One final alternative might be to add a per-process mutex for callers of
> getrusage() to acquire before acquiring the sighand lock, or to be used
> as a total replacement of the sigahnd lock. We haven't fully explored
> what the implications of this might be.
>
> Signed-off-by: Dylan Hatch <dylanbhatch@...gle.com>
> ---
>  include/linux/sched/signal.h | 13 +++++++++++
>  kernel/signal.c              | 43 ++++++++++++++++++++++++++++++++++++
>  kernel/sys.c                 |  8 ++++++-
>  3 files changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3499c1a8b9295..7852f16139965 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -747,6 +747,19 @@ static inline struct sighand_struct *lock_task_sighand(struct task_struct *task,
>  	return ret;
>  }
>
> +extern struct sighand_struct *__lock_task_sighand_safe(struct task_struct *task,
> +							unsigned long *flags);
> +
> +static inline struct sighand_struct *lock_task_sighand_safe(struct task_struct *task,
> +						       unsigned long *flags)
> +{
> +	struct sighand_struct *ret;
> +
> +	ret = __lock_task_sighand_safe(task, flags);
> +	(void)__cond_lock(&task->sighand->siglock, ret);
> +	return ret;
> +}
> +
>  static inline void unlock_task_sighand(struct task_struct *task,
>  						unsigned long *flags)
>  {
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 47a7602dfe8df..6d60c73b7ab91 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1397,6 +1397,49 @@ int zap_other_threads(struct task_struct *p)
>  	return count;
>  }
>
> +struct sighand_struct *__lock_task_sighand_safe(struct task_struct *tsk,
> +						unsigned long *flags)
> +{
> +	struct sighand_struct *sighand;
> +	int n;
> +	bool lock = false;
> +
> +again:
> +	rcu_read_lock();
> +	local_irq_save(*flags);
> +	for (n = 0; n < 500; n++) {
> +		sighand = rcu_dereference(tsk->sighand);
> +		if (unlikely(sighand == NULL))
> +			break;
> +
> +		/*
> +		 * The downside of this approach is we loose the fairness of
> +		 * FIFO waiting because the acqusition order between multiple
> +		 * waiting tasks is effectively random.
> +		 */
> +		lock = spin_trylock(&sighand->siglock);
> +		if (!lock) {
> +			cpu_relax();
> +			continue;
> +		}
> +
> +		/* __lock_task_sighand has context explaining this check. */
> +		if (likely(sighand == rcu_access_pointer(tsk->sighand)))
> +			break;
> +		spin_unlock(&sighand->siglock);
> +		lock = false;
> +	}
> +	rcu_read_unlock();
> +
> +	/* Handle pending IRQ */
> +	if (!lock && sighand) {
> +		local_irq_restore(*flags);
> +		goto again;
> +	}
> +
> +	return sighand;
> +}
> +
>  struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
>  					   unsigned long *flags)
>  {
> diff --git a/kernel/sys.c b/kernel/sys.c
> index e219fcfa112d8..1b1254a3c506b 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1798,7 +1798,13 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
>  		goto out;
>  	}
>
> -	if (!lock_task_sighand(p, &flags))
> +	/*
> +	 * We use lock_task_sighand_safe here instead of lock_task_sighand
> +	 * because one process with many threads all calling getrusage may
> +	 * otherwise cause an NMI watchdog timeout by disabling IRQs for too
> +	 * long.
> +	 */
> +	if (!lock_task_sighand_safe(p, &flags))
>  		return;
>
>  	switch (who) {
> --
> 2.43.0.381.gb435a96ce8-goog
>