linux-kernel - Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250716153144.GY1613200@noisy.programming.kicks-ass.net>
Date: Wed, 16 Jul 2025 17:31:44 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	linux-trace-kernel@...r.kernel.org, Nam Cao <namcao@...utronix.de>,
	Tomas Glozar <tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>,
	Clark Williams <williams@...hat.com>,
	John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model

On Wed, Jul 16, 2025 at 04:38:36PM +0200, Gabriele Monaco wrote:

> So as you said, we can still reconstruct what happened from the trace, but the
> model suddenly needs more states and more events.

So given a sequence like:

  trace_sched_enter_tp();
  { trace_irq_disable();
    **irq_entry();**
    **irq_exit();**
    trace_irq_enable(); } * Ni
  trace_irq_disable();
  { trace_sched_switch(); } * Nj
  trace_irq_enable();
  { trace_irq_disable();
    **irq_entry();**
    **irq_exit();**
    trace_irq_enable(); } * Nk
  trace_sched_exit_tp();

It becomes somewhat hard to figure out which exact IRQ disabled section
the switch did not happen in (Nj == 0).

> If we could directly tell whether interrupts were disabled manually or from an
> actual interrupt, that wouldn't be necessary, for instance (as in the original
> model by Daniel).

Hmm.. we do indeed appear to trace the IRQ state before adding
HARDIRQ_OFFSET to preempt_count(). Yes, that complicates things a
little.

So... it *might* be possible to lift lockdep_hardirq_enter() to before
we start tracing. But then you're stuck to running with lockdep
enabled -- I'm thinking that's not ideal, given those other patches you
sent.

I'm going to go on holidays soon, but I've made a note to see if we can
lift setting HARDIRQ_OFFSET before we start tracing. IIRC the current
order is because setting HARDIRQ_OFFSET is using preempt_count_add()
which can be instrumented itself.

But we could use __preempt_count_add() instead, then we loose the
tracing from setting HARDIRQ_OFFSET, but I don't think that is a
problem. We already get the latency from the IRQ tracepoints after all.

> I get your point why we don't really need the additional tracepoint, but some
> arguments giving more context come almost for free.

Right. So please always try and justify adding tracepoints.