lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f1365f2cd84597fd3547544fcceab5c79682624.camel@redhat.com>
Date: Mon, 19 May 2025 12:28:12 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>, 
	linux-trace-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Peter
 Zijlstra <peterz@...radead.org>, Tomas Glozar <tglozar@...hat.com>, Juri
 Lelli <jlelli@...hat.com>
Subject: Re: [RFC PATCH v2 10/12] rv: Retry when da monitor detects race
 conditions



On Mon, 2025-05-19 at 11:06 +0200, Nam Cao wrote:
> On Wed, May 14, 2025 at 10:43:12AM +0200, Gabriele Monaco wrote:
> > -static inline
> > void										\
> > -da_monitor_set_state_##name(struct da_monitor *da_mon, enum
> > states_##name state)		\
> > +static inline
> > bool										\
> > +da_monitor_set_state_##name(struct da_monitor *da_mon, enum
> > states_##name prev_state,		\
> > +			    enum states_##name
> > state)						\
> >  {								
> > 				\
> > -	da_mon->curr_state =
> > state;								\
> > +	return try_cmpxchg(&da_mon->curr_state, &prev_state,
> > state);				\
> >  }								
> > 				\
> 
> This is a very thin wrapper. Should we just call try_cmpxchg()
> directly?

Mmh, right, at this point the wrapper does nothing but making the code
more obscure, will do.

> 
> >  static inline
> > bool										\
> >  da_event_##name(struct da_monitor *da_mon, enum events_##name
> > event)				\
> >  {								
> > 				\
> > -	type curr_state =
> > da_monitor_curr_state_##name(da_mon);					\
> > -	type next_state = model_get_next_state_##name(curr_state,
> > event);			\
> > +	bool
> > changed;										\
> > +	type curr_state,
> > next_state;								\
> >  								
> > 				\
> > -	if (next_state != INVALID_STATE)
> > {							\
> > -		da_monitor_set_state_##name(da_mon,
> > next_state);				\
> > +	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++)
> > {					\
> > +		curr_state =
> > da_monitor_curr_state_##name(da_mon);				\
> 
> For the follow-up iterations (i > 0), it is not necessary to read
> curr_state again here, we already have its value from try_cmpxchg()
> below.

Yeah good point.

> 
> Also, thinking about memory barrier hurts my main, but I'm not
> entirely
> sure if reading curr_state without memory barrier here is okay.
> 

I guess we are on the same boat here. I couldn't really understand how
much of a barrier is the try_cmpxchg imposing (if any), but didn't see
any noticeable difference adding an explicit smp write barrier to pair
with the READ_ONCE in da_monitor_curr_state, so straight assumed we can
do without it.
But I definitely appreciate opinions on this.

> How about something like below?
> 
> curr_state = da_monitor_curr_state_##name(da_mon);
> for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {
> 	next_state = model_get_next_state_##name(curr_state, event);
> 	if (next_state == INVALID_STATE)
> 		break;
> 	if (try_cmpxchg(&da_mon->curr_state, &curr_state,
> next_state))
> 		break;
> }
> 

Yeah, that's neater.

> Furthermore, it is possible to replace for(...) with while (1)? I
> don't
> think we can have a live lock, because if we fail to do
> try_cmpxchg(),
> the "other guy" surely succeed.
> 

Mmh, although definitely unlikely, I'm thinking of a case in which the
event starts on one CPU and at the same time we see events in IRQ and 
on another CPU, let's say continuously. Nothing forbids that between
any two consecutive try_cmpxchg another CPU/context changes the next
state (making the local try_cmpxchg fail).
In practice I've never seen it going on the second iteration, as the
critical section is really tiny, but I'm not sure we can guarantee this
never happens.
Or am I missing something?

Thanks,
Gabriele


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ