linux-kernel - Re: [PATCH] kprobes: Fix to delay the kprobes jump optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210222100400.xzdq7iciw5f4zzlv@linutronix.de>
Date:   Mon, 22 Feb 2021 11:04:00 +0100
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Uladzislau Rezki <urezki@...il.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Daniel Axtens <dja@...ens.net>,
        Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Michal Hocko <mhocko@...e.com>,
        "Theodore Y . Ts'o" <tytso@....edu>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [PATCH] kprobes: Fix to delay the kprobes jump optimization

On 2021-02-19 10:18:11 [-0800], Paul E. McKenney wrote:
> If Masami's patch works for the PowerPC guys on v5.10-rc7, then it can
> be backported.  The patch making RCU Tasks initialize itself early won't
> have any effect and can be left or reverted, as we choose.  The self-test
> patch will need to be either adjusted or reverted.
> 
> However...
> 
> The root cause of this problem is that softirq only kind-of works
> during a window of time during boot.  It works only if the number and
> duration of softirq handlers during this time is small enough, for some
> ill-defined notion of "small enough".  If there are too many, whatever
> that means exactly, then we get failed attempt to awaken ksoftirqd, which

The number of registered softirq handlers does not matter nor the amount
times the individual softirqs that were scheduled. The only problem is
that one schedules softirq and then waits for its completion.
So scheduling a timer_list timer works. Waiting for its completion does
not. Once ksoftirqd is up, will be processed.

> (sometimes!) results in a silent hang.  Which, as you pointed out earlier,
> is a really obnoxious error message.  And any minor change could kick
> us into silent-hang state because of the heuristics used to hand off
> to ksoftirqd.  The straw that broke the camel's back and all that.

The problem is that a softirq is raised and being waited for its
completion.
Something like synchronize_rcu() would be such a thing I guess.

> One approach would be to add WARN_ON_ONCE() so that if softirq tries
> to awaken ksoftirqd before it is spawned, we get a nice obvious splat.
> Unfortunately, this gives false positives because there is code that
> needs a softirq handler to run eventually, but is OK with that handler
> being delayed until some random point in the early_initcall() sequence.
> 
> Besides which, if we are going to add a check, why not use that check
> just make things work by forcing handler execution to remain within the
> softirq back-of-interrupt context instead of awakening a not-yet-spawned
> ksoftirqd?  We can further prevent entry into dyntick-idle state until
> the ksoftirqd kthreads have been spawned, which means that if softirq
> handlers must be deferred, they will be resumed within one jiffy by the
> next scheduler-clock interrupt.

This should work.

> Yes, this can allow softirq handlers to impose large latencies, but only
> during early boot, long before any latency-sensitive applications can
> possibly have been created.  So this does not seem like a real problem.
> 
> Am I missing something here?
> 
> 							Thanx, Paul

Sebastian