linux-kernel - Re: [PATCH v4 1/5] tracing: Introduce faultable tracepoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba543d44-9302-4115-ac4f-d4e9f8d98a90@paulmck-laptop>
Date:   Tue, 21 Nov 2023 07:51:52 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        linux-kernel@...r.kernel.org,
        Michael Jeanson <mjeanson@...icios.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Yonghong Song <yhs@...com>, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>, bpf@...r.kernel.org,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH v4 1/5] tracing: Introduce faultable tracepoints

On Tue, Nov 21, 2023 at 09:56:55AM -0500, Mathieu Desnoyers wrote:
> On 2023-11-21 09:46, Peter Zijlstra wrote:
> > On Tue, Nov 21, 2023 at 09:40:24AM -0500, Mathieu Desnoyers wrote:
> > > On 2023-11-21 09:36, Peter Zijlstra wrote:
> > > > On Tue, Nov 21, 2023 at 09:06:18AM -0500, Mathieu Desnoyers wrote:
> > > > > Task trace RCU fits a niche that has the following set of requirements/tradeoffs:
> > > > > 
> > > > > - Allow page faults within RCU read-side (like SRCU),
> > > > > - Has a low-overhead read lock-unlock (without the memory barrier overhead of SRCU),
> > > > > - The tradeoff: Has a rather slow synchronize_rcu(), but tracers should not care about
> > > > >     that. Hence, this is not meant to be a generic replacement for SRCU.
> > > > > 
> > > > > Based on my reading of https://lwn.net/Articles/253651/ , preemptible RCU is not a good
> > > > > fit for the following reasons:
> > > > > 
> > > > > - It disallows blocking within a RCU read-side on non-CONFIG_PREEMPT kernels,
> > > > 
> > > > Your counter points are confused, we simply don't build preemptible RCU
> > > > unless PREEMPT=y, but that could surely be fixed and exposed as a
> > > > separate flavour.
> > > > 
> > > > > - AFAIU the mmap_sem used within the page fault handler does not have priority inheritance.
> > > > 
> > > > What's that got to do with anything?

Preemptible RCU allows blocking/preemption only in those cases where
priority inheritance applies.  As Mathieu says below, this prevents
indefinite postponement of a global grace period.  Such postponement is
especially problematic in kernels built with PREEMPT_RCU=y.  For but one
example, consider a situation where someone maps a file served by NFS.
We can debate the wisdom of creating such a map, but having the kernel
OOM would be a completely unacceptable "error message".

> > > > Still utterly confused about what task-tracing rcu is and how it is
> > > > different from preemptible rcu.

Task Trace RCU allows general blocking, which it tolerates by stringent
restrictions on what exactly it is used for (tracing in cases where the
memory to be included in the tracing might page fault).  Preemptible RCU
does not permit general blocking.

Tasks Trace RCU is a very specialized tool for a very specific use case.

> > > In addition to taking the mmap_sem, the page fault handler need to block
> > > until its requested pages are faulted in, which may depend on disk I/O.
> > > Is it acceptable to wait for I/O while holding preemptible RCU read-side?
> > 
> > I don't know, preemptible rcu already needs to track task state anyway,
> > it needs to ensure all tasks have passed through a safe spot etc.. vs regular
> > RCU which only needs to ensure all CPUs have passed through start.
> > 
> > Why is this such a hard question?

It is not a hard question.  Nor is the answer, which is that preemptible
RCU is not a good fit for this task for all the reasons that Mathieu has
laid out.

> Personally what I am looking for is a clear documentation of preemptible rcu
> with respect to whether it is possible to block on I/O (take a page fault,
> call schedule() explicitly) from within a preemptible rcu critical section.
> I guess this is a hard question because there is no clear statement to that
> effect in the kernel documentation.
> 
> If it is allowed (which I doubt), then I wonder about the effect of those
> long readers on grace period delays. Things like expedited grace periods may
> suffer.
> 
> Based on Documentation/RCU/rcu.rst:
> 
>   Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
>   same effect, but require that the readers manipulate CPU-local
>   counters.  These counters allow limited types of blocking within
>   RCU read-side critical sections.  SRCU also uses CPU-local
>   counters, and permits general blocking within RCU read-side
>   critical sections.  These variants of RCU detect grace periods
>   by sampling these counters.
> 
> Then we just have to find a definition of "limited types of blocking"
> vs "general blocking".

The key point is that you are not allowed to place any source code in
a preemptible RCU reader that would not be legal in a non-preemptible
RCU reader.  The rationale again is that the cases in which preemptible
RCU readers call schedule are cases to which priority boosting applies.

It is quite possible that the documentation needs upgrading.  ;-)

							Thanx, Paul