linux-kernel - Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f7028488e9d820848955de87ead3ec619fe5dbec.camel@redhat.com>
Date: Wed, 16 Jul 2025 10:38:11 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>, 
	linux-trace-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Peter
 Zijlstra <peterz@...radead.org>, Tomas Glozar <tglozar@...hat.com>, Juri
 Lelli <jlelli@...hat.com>,  Clark Williams <williams@...hat.com>, John
 Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v3 11/17] rv: Retry when da monitor detects race
 conditions



On Wed, 2025-07-16 at 10:27 +0200, Nam Cao wrote:
> On Wed, Jul 16, 2025 at 10:20:39AM +0200, Gabriele Monaco wrote:
> > On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> > > If I understand correctly, if after 3 tries and we still fail to
> > > change the
> > > state, we will invoke the reactor and trace_error? Doesn't that
> > > cause
> > > a
> > > false positive? Because it is not a violation of the model, it is
> > > just a
> > > race making us fail to change the state.
> > > 
> > 
> > Yes, that's correct.
> > My rationale was that, at that point, the monitor is likely no
> > longer
> > in sync, so silently ignoring the situation is not really an
> > option.
> > In this case, the reaction includes an invalid current state
> > (because
> > in fact we don't know what the current state is) and tools may be
> > able
> > to understand that.
> 
> Can't you bring the monitor back to the init state, and start over
> again?
> 
> I think "da_mon->monitoring = 0;" does the trick?
> 

Yes you can, but I wouldn't do so silently.
I'd say the cleanest approach without reaction is to still return false
for the system to do all the cleanup but trace the event or, at the
very least, print a warning.

But you're right, this is more relevant for who develops the monitor
rather than for the user, so should probably be tracked separately.

Thanks,
Gabriele

> > I know you wouldn't be able to do that in LTL..  By the way, LTL
> > uses
> > multiple statuses, so this lockless approach may not really work.
> 
> Let's worry about one thing at a time ;)
> 
> > I don't see this situation happening often: I only ever observed 2
> > events able to race, 4 happening at the same time is wild, but of
> > course cannot be excluded in principle for any possible monitor.
> > Yet, I have the feeling a monitor where this can happen is not well
> > designed and RV should point that out.
> > Do you have ideas of potential monitors where more than 3 events
> > can
> > race?
> > 
> > Perhaps a full blown reaction is a bit aggressive in this
> > situation, as
> > the /fault/ may not be necessarily in the monitor.
> > We could think of a special tracepoint or just printing.
> > 
> > > Same below.
> > > 
> > > Also, I wouldn't use goto unless necessary. Perhaps it is better
> > > to
> > > put the
> > > code at "out_react:" and "out_success:" into the loop. But that's
> > > just my
> > > personal preference, up to you.
> > 
> > That could be done if we do a whole different thing when retries
> > run
> > out, instead of defaulting to out_react.
> > I liked to avoid excessive indentation with those goto as well but
> > yeah, it may not be quite necessary.
> 
> Sure, as I said before, "just my personal preference, up to you."
> 
> Nam