lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150911155901.GA27246@redhat.com>
Date:	Fri, 11 Sep 2015 17:59:01 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Davidlohr Bueso <dave@...olabs.net>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org
Subject: [PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks

On 09/10, Paul E. McKenney wrote:
>
> On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > On 09/09, Paul E. McKenney wrote:
> > >
> > > This is obsolete, but its replacement is the same patch.
> >
> > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > I guess?
> >
> > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > locktorture work?
> >
> > No, I think the patch is fine. Thanks for doing this! I was going to
> > send something like this change too. And in fact I am still thinking
> > about another test which plays with rcu_sync only, but probably we
> > need some cleanups first (and we need them anyway). I'll try to do
> > this a bit later.
>
> I would welcome an rcu_sync-specific torture patch!

I want it much more than you ;) I have already warned you, I'll send
more rcu_sync patches. The current code is actually a very early draft
which was written during the discussion with Peter a long ago. I sent
it unchanged because a) it was already reviewed and b) I tested it a
bit in the past.

We can greatly simplify this code and at the same time make it more
useful. Actually I already have the patches. The 1st one removes
rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
almost self-obvious and allows other improvements. See the resulting
(pseudo) code at the end.

But again, I'll try very much to write the test before I send the patch.


Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
code looks trivial but imo really annoying. And it is not complete,
so lets document this at least. Plus rcu_lockdep_assert() looks more
consistent.


> > > +void torture_percpu_rwsem_init(void)
> > > +{
> > > +	BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
> > > +}
> > > +
> >
> > Aha, we don't really need this... I mean we can use the static initialiser
> > which can also be used by uprobes and cgroups. I'll try to send the patch
> > tomorrow.
>
> Very good, please do!

Hmm. I am lier. I won't send this patch at least today.

The change I had in mind is very simple,

	#define DECLARE_PERCPU_RWSEM(sem)				\
		static DEFINE_PER_CPU(unsigned int, sem##_counters);	\
		struct percpu_rw_semaphore sem = {			\
			.fast_read_ctr = &sem##_counters,		\
			...						\
		}
			
and yes, uprobes and cgroups can use it.

But somehow I missed that we can't use it to define a _static_ sem,

	static DECLARE_PERCPU_RWSEM(sem);

obviously won't work. And damn, I am shy to admit that I spent several
hours trying to invent something but failed. Perhaps we can add 2 helpers,
DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

Oleg.

-------------------------------------------------------------------------------
static const struct {
	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
	void (*wait)(void);	// TODO: remove this
#ifdef CONFIG_PROVE_RCU
	int  (*held)(void);
#endif
} gp_ops[] = {
	...
};

// COMMENT to explain these states
enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };

#define	rss_lock	gp_wait.lock

// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!!!!!!!!
// XXX code must be removed when we split rcu_sync_enter() into start + wait
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

static void rcu_sync_func(struct rcu_head *rcu)
{
	struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
	unsigned long flags;

	BUG_ON(rsp->gp_state == GP_IDLE);
	BUG_ON(rsp->gp_state == GP_PASSED);

	spin_lock_irqsave(&rsp->rss_lock, flags);
	if (rsp->gp_count) {
		/*
		 * COMMENT.
		 */
		rsp->gp_state = GP_PASSED;
		wake_up_locked(&rsp->gp_wait);
	} else if (rsp->gp_state == GP_REPLAY) {
		/*
		 * A new rcu_sync_exit() has happened; requeue the callback
		 * to catch a later GP.
		 */
		rsp->gp_state = GP_EXIT;
		gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
	} else {
		/*
		 * We're at least a GP after rcu_sync_exit(); eveybody will now
		 * have observed the write side critical section. Let 'em rip!.
		 */
		BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
		rsp->gp_state = GP_IDLE;
	}
	spin_unlock_irqrestore(&rsp->rss_lock, flags);
}

static void rcu_sync_call(struct rcu_sync *rsp)
{
	// TODO:
	// This is called by might_sleep() code outside of ->rss_lock,
	// we can avoid ->call() in some cases (say rcu_blocking_is_gp())
	gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
}

void rcu_sync_enter(struct rcu_sync *rsp)
{
	int gp_count, gp_state;

	spin_lock_irq(&rsp->rss_lock);
	gp_count = rsp->gp_count++;
	gp_state = rsp->gp_state;
	if (gp_state == GP_IDLE)
		rsp->gp_state = GP_ENTER;
	spin_unlock_irq(&rsp->rss_lock);

	BUG_ON(gp_count != 0 && gp_state == GP_IDLE);
	BUG_ON(gp_count == 0 && gp_state == GP_PASSED);
	BUG_ON(gp_count == 0 && gp_state == GP_ENTER); // XXX

	if (gp_state == GP_IDLE)
		rcu_sync_call(rsp);

	wait_event(rsp->gp_wait, rsp->gp_state != GP_ENTER);
	BUG_ON(rsp->gp_state < GP_PASSED);
}

void rcu_sync_exit(struct rcu_sync *rsp)
{
	bool need_call;

	BUG_ON(rsp->gp_state == GP_IDLE);
	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX

	spin_lock_irq(&rsp->rss_lock);
	if (!--rsp->gp_count) {
		if (rsp->gp_state == GP_PASSED) {
			need_call = true;
			rsp->gp_state = GP_EXIT;
		} else if (rsp->gp_state == GP_EXIT) {
			rsp->gp_state = GP_REPLAY;
		}
	}
	spin_unlock_irq(&rsp->rss_lock);

	// Comment to explain why we do not care if another enter()
	// and perhaps even exit() comes after spin_unlock().
	if (need_call)
		rcu_sync_call(rsp);
}

void rcu_sync_dtor(struct rcu_sync *rsp)
{
	int gp_state;

	BUG_ON(rsp->gp_count);
	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
	BUG_ON(rsp->gp_state == GP_PASSED);

	spin_lock_irq(&rsp->rss_lock);
	if (rsp->gp_state == GP_REPLAY)
		rsp->gp_state = GP_EXIT;
	gp_state = rsp->gp_state;
	spin_unlock_irq(&rsp->rss_lock);

	// TODO: add another wake_up_locked() into rcu_sync_func(),
	// use wait_event + spin_lock_wait, remove gp_ops->wait().

	if (gp_state != GP_IDLE) {
		gp_ops[rsp->gp_type].wait();
		BUG_ON(rsp->gp_state != GP_IDLE);
	}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ