lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2220347.1605801222@warthog.procyon.org.uk>
Date:   Thu, 19 Nov 2020 15:53:42 +0000
From:   David Howells <dhowells@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>, paulmck@...nel.org
cc:     dhowells@...hat.com, jlayton@...hat.com, willy@...radead.org,
        linux-cachefs@...hat.com, linux-kernel@...r.kernel.org
Subject: Can you help diagnose a weird failed wake?

Hi Peter, Paul,

Jeff Layton is seeing a weird failed wake in the fscache rewrite.  Can you
have a look see if I've mucked up somewhere?  For reference, Jeff's branch is
here:

	https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git
	ceph-fscache-iter-experimental

The situation is that the fscache_cookie struct has a member called 'stage'
that indicates the state of a cookie representing a data file.  As a cache
object is set up, it goes through the LOOKING_UP stage (3) whilst the disk is
being queried to the ACTIVE stage (5) when the backing file has been opened -
at which point we can do I/O on it.  Each time the stage is changed, we call
wake_up_var() to ping any waiters on the variable.  Waiters use
wait_var_event() and co. to wait on it.

We've added some tracepoints.  There's a function which begins an operation on
the cache.  It waits for the cache to get into the appropriate stage as part
of that.  Around the bit doing the wait in we have:

	int __fscache_begin_operation(...)
	{
	...
	wait_and_validate:
		spin_unlock(&cookie->lock);
		cookie->trace = true;
		trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_wait_begin_op);
		timeo = wait_var_event_timeout(&cookie->stage, cookie_stage_changed(cookie, stage), 10*HZ);
		if (timeo <= 1) {
			pr_warn("%s: cookie stage change wait timed out: cookie->stage=%u stage=%u",
				__func__, READ_ONCE(cookie->stage), stage);
			fscache_print_cookie(cookie, 'O');
			trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_wait_fail);
		} else {
			trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_waited);
		}
		cookie->trace = false;
		goto again;
	...
	}

and cookie_stage_changed() looks like:

	static bool cookie_stage_changed(struct fscache_cookie *cookie, enum fscache_cookie_stage stage)
	{
		bool changed = (READ_ONCE(cookie->stage) != stage);

		if (!changed)
			trace_fscache_stage_wait(cookie, stage, fscache_cookie_stage_waiting);

		return changed;
	}

At the point the wakeup happens, we have:

	void fscache_set_cookie_stage(struct fscache_cookie *cookie,                                        
				      enum fscache_cookie_stage stage)                                      
	{                                                                                                  
		bool changed = false, trace;                                                                

		spin_lock(&cookie->lock);                                                                  
		switch (cookie->stage) {                                                                    
		case FSCACHE_COOKIE_STAGE_INDEX:                                                            
		case FSCACHE_COOKIE_STAGE_WITHDRAWING:                                                      
		case FSCACHE_COOKIE_STAGE_RELINQUISHING:                                                    
			break;                                                                              
		default:                                                                                    
			trace = cookie->trace;                                                              
			if (trace)                                                                          
				trace_fscache_stage_set(cookie, stage, fscache_cookie_stage_set_general);  
			cookie->stage = stage;                                                              
			changed = true;                                                                    
			break;                                                                              
		}                                                                                          
		spin_unlock(&cookie->lock);                                                                
		if (changed) {                                                                              
			wake_up_var(&cookie->stage);                                                        
			if (trace)                                                                          
				trace_fscache_stage_set(cookie, stage, fscache_cookie_stage_woken);        
		}                                                                                          
	}                                                                                                  

In dmesg, we see:

	[ 2977.269392] FS-Cache: __fscache_begin_operation: cookie stage change wait timed out: cookie->stage=5 stage=3
	[ 2977.269398] FS-Cache: O-cookie c=000dd7d9 [p=000dce05 fl=0 nc=0 na=2]
	[ 2977.276673] FS-Cache: O-cookie d=CEPH.in
	[ 2977.277796] FS-Cache: O-cookie o=201635
	[ 2977.278552] FS-Cache: O-key=[16] '352b930000010000feffffffffffffff'

In the trace log, we see:

	 <...>-425461  [015] ....  2967.107716: fscache_stage_wait: c=000dd7d9 WAIT beginop now=3 arg=3
	kfsc/7-896     [005] ....  2967.107717: fscache_stage_set:  c=000dd7d9 SET general  now=3 arg=5
	 <...>-425461  [015] ....  2967.107718: fscache_stage_wait: c=000dd7d9 WAITING      now=3 arg=3
	 <...>-425461  [015] ....  2967.107718: fscache_stage_wait: c=000dd7d9 WAITING      now=5 arg=3
	kfsc/7-896     [005] ....  2967.107719: fscache_stage_set:  c=000dd7d9 WOKEN        now=5 arg=5
	 <...>-425461  [015] .N..  2977.452925: fscache_stage_wait: c=000dd7d9 WAIT failed  now=5 arg=3

So we can see the wait beginning, with the stage at LOOKING_UP (3).  Note that
we're actually waiting for stage!=3.

Next, on cpu 5, we're about to set the stage ("SET" line), and then we've
completed the wakeup 2uS later ("WOKEN" line).

Whilst the stage-change is happening, the wait condition check happens twice
in quick succession (two "WAITING" lines).  The first is presumably right at
the beginning of wait_var_event() and the second is inside the wait loop.  In
the first check, the stage is still 3, but in the second call of the condition
function, it reads it as 3 (still LOOKING_UP), and so meets the condition to
do the tracepoint - which then sees it as 5 (ACTIVE), but this doesn't affect
the result of the condition function.

[*] Note that in the traceline, cookie->stage as read inside the tracepoint is
    the "now" number and the second is fed in by the caller.

wait_var_event()/wait_var_event_timeout() then goes to sleep - which it
shouldn't, because either the task state should've been cleared, or the
condition function should prevent it.

Any thoughts on how to debug this further?  Or if I've just done something
silly?

Also, do I need to use WRITE_ONCE() or smp_store_release() when setting
cookie->store, since I'm reading it outside of the spinlock, albeit through
READ_ONCE().

Thanks,
David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ