lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 29 Sep 2013 20:36:34 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: [RFC] introduce synchronize_sched_{enter,exit}()

Hello.

Paul, Peter, et al, could you review the code below?

I am not sending the patch, I think it is simpler to read the code
inline (just in case, I didn't try to compile it yet).

It is functionally equivalent to

	struct xxx_struct {
		atomic_t counter;
	};

	static inline bool xxx_is_idle(struct xxx_struct *xxx)
	{
		return atomic_read(&xxx->counter) == 0;
	}

	static inline void xxx_enter(struct xxx_struct *xxx)
	{
		atomic_inc(&xxx->counter);
		synchronize_sched();
	}

	static inline void xxx_enter(struct xxx_struct *xxx)
	{
		synchronize_sched();
		atomic_dec(&xxx->counter);
	}

except: it records the state and synchronize_sched() is only called by
xxx_enter() and only if necessary.

Why? Say, percpu_rw_semaphore, or upcoming changes in get_online_cpus(),
(Peter, I think they should be unified anyway, but lets ignore this for
now). Or freeze_super() (which currently looks buggy), perhaps something
else. This pattern

	writer:
		state = SLOW_MODE;
		synchronize_rcu/sched();

	reader:
		preempt_disable();	// or rcu_read_lock();
		if (state != SLOW_MODE)
			...

is quite common.

Note:
	- This implementation allows multiple writers, and sometimes
	  this makes sense.

	- But it's trivial to add "bool xxx->exclusive" set by xxx_init().
	  If it is true only one xxx_enter() is possible, other callers
	  should block until xxx_exit(). This is what percpu_down_write()
	  actually needs.

	- Probably it makes sense to add xxx->rcu_domain = RCU/SCHED/ETC.

Do you think it is correct? Makes sense? (BUG_ON's are just comments).

Oleg.

// .h	-----------------------------------------------------------------------

struct xxx_struct {
	int			gp_state;

	int			gp_count;
	wait_queue_head_t	gp_waitq;

	int			cb_state;
	struct rcu_head		cb_head;
};

static inline bool xxx_is_idle(struct xxx_struct *xxx)
{
	return !xxx->gp_state; /* GP_IDLE */
}

extern void xxx_enter(struct xxx_struct *xxx);
extern void xxx_exit(struct xxx_struct *xxx);

// .c	-----------------------------------------------------------------------

enum { GP_IDLE = 0, GP_PENDING, GP_PASSED };

enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };

#define xxx_lock	gp_waitq.lock

void xxx_enter(struct xxx_struct *xxx)
{
	bool need_wait, need_sync;

	spin_lock_irq(&xxx->xxx_lock);
	need_wait = xxx->gp_count++;
	need_sync = xxx->gp_state == GP_IDLE;
	if (need_sync)
		xxx->gp_state = GP_PENDING;
	spin_unlock_irq(&xxx->xxx_lock);

	BUG_ON(need_wait && need_sync);

	} if (need_sync) {
		synchronize_sched();
		xxx->gp_state = GP_PASSED;
		wake_up_all(&xxx->gp_waitq);
	} else if (need_wait) {
		wait_event(&xxx->gp_waitq, xxx->gp_state == GP_PASSED);
	} else {
		BUG_ON(xxx->gp_state != GP_PASSED);
	}
}

static void cb_rcu_func(struct rcu_head *rcu)
{
	struct xxx_struct *xxx = container_of(rcu, struct xxx_struct, cb_head);
	long flags;

	BUG_ON(xxx->gp_state != GP_PASSED);
	BUG_ON(xxx->cb_state == CB_IDLE);

	spin_lock_irqsave(&xxx->xxx_lock, flags);
	if (xxx->gp_count) {
		xxx->cb_state = CB_IDLE;
	} else if (xxx->cb_state == CB_REPLAY) {
		xxx->cb_state = CB_PENDING;
		call_rcu_sched(&xxx->cb_head, cb_rcu_func);
	} else {
		xxx->cb_state = CB_IDLE;
		xxx->gp_state = GP_IDLE;
	}
	spin_unlock_irqrestore(&xxx->xxx_lock, flags);
}

void xxx_exit(struct xxx_struct *xxx)
{
	spin_lock_irq(&xxx->xxx_lock);
	if (!--xxx->gp_count) {
		if (xxx->cb_state == CB_IDLE) {
			xxx->cb_state = CB_PENDING;
			call_rcu_sched(&xxx->cb_head, cb_rcu_func);
		} else if (xxx->cb_state == CB_PENDING) {
			xxx->cb_state = CB_REPLAY;
		}
	}
	spin_unlock_irq(&xxx->xxx_lock);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ