linux-kernel - Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20061119212057.GE4427@us.ibm.com>
Date:	Sun, 19 Nov 2006 13:20:57 -0800
From:	"Paul E. McKenney" <paulmck@...ibm.com>
To:	Oleg Nesterov <oleg@...sign.ru>
Cc:	Alan Stern <stern@...land.harvard.edu>,
	Jens Axboe <jens.axboe@...cle.com>,
	Linus Torvalds <torvalds@...l.org>,
	Thomas Gleixner <tglx@...esys.com>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	john stultz <johnstul@...ibm.com>,
	David Miller <davem@...emloft.net>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...l.org>, Andi Kleen <ak@...e.de>,
	manfred@...orfullife.com
Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Sun, Nov 19, 2006 at 11:55:16PM +0300, Oleg Nesterov wrote:
> On 11/19, Alan Stern wrote:
> >
> > On Sun, 19 Nov 2006, Oleg Nesterov wrote:
> >
> > > 	int xxx_read_lock(struct xxx_struct *sp)
> > > 	{
> > > 		int idx;
> > >
> > > 		idx = sp->completed & 0x1;
> > > 		atomic_inc(sp->ctr + idx);
> > > 		smp_mb__after_atomic_inc();
> > >
> > > 		return idx;
> > > 	}
> > >
> > > 	void xxx_read_unlock(struct xxx_struct *sp, int idx)
> > > 	{
> > > 		if (atomic_dec_and_test(sp->ctr + idx))
> > > 			wake_up(&sp->wq);
> > > 	}
> > >
> > > 	void synchronize_xxx(struct xxx_struct *sp)
> > > 	{
> > > 		wait_queue_t wait;
> > > 		int idx;
> > >
> > > 		init_wait(&wait);
> > > 		mutex_lock(&sp->mutex);
> > >
> > > 		idx = sp->completed++ & 0x1;
> > >
> > > 		for (;;) {
> > > 			prepare_to_wait(&sp->wq, &wait, TASK_UNINTERRUPTIBLE);
> > >
> > > 			if (!atomic_add_unless(sp->ctr + idx, -1, 1))
> > > 				break;
> > >
> > > 			schedule();
> > > 			atomic_inc(sp->ctr + idx);
> > > 		}
> > > 		finish_wait(&sp->wq, &wait);
> > >
> > > 		mutex_unlock(&sp->mutex);
> > > 	}
> > >
> > > Very simple. Note that synchronize_xxx() is O(1), doesn't poll, and could
> > > be optimized further.
> >
> > What happens if synchronize_xxx manages to execute inbetween
> > xxx_read_lock's
> >
> >  		idx = sp->completed & 0x1;
> >  		atomic_inc(sp->ctr + idx);
> >
> > statements?
> 
> Oops. I forgot about explicit mb() before sp->completed++ in synchronize_xxx().
> 
> So synchronize_xxx() should do
> 
> 	smp_mb();
> 	idx = sp->completed++ & 0x1;
> 
> 	for (;;) { ... }
> 
> >               You see, there's no way around using synchronize_sched().
> 
> With this change I think we are safe.
> 
> If synchronize_xxx() increments ->completed in between, the caller of
> xxx_read_lock() will see all memory ops (started before synchronize_xxx())
> completed. It is ok that synchronize_xxx() returns immediately.

Let me take Alan's example one step further:

o	CPU 0 starts executing xxx_read_lock(), but is interrupted
	(or whatever) just before the atomic_inc().

o	CPU 1 executes synchronize_xxx() to completion, which it
	can because CPU 0 has not yet incremented the counter.

o	CPU 0 returns from interrupt and completes xxx_read_lock(),
	but has incremented the wrong counter.

o	CPU 0 continues into its critical section, picking up a
	pointer to an xxx-protected data structure (or, in Jens's
	case starting an xxx-protected I/O).

o	CPU 1 executes another synchronize_xxx().  This completes
	immediately because CPU 1 has the wrong counter incremented.

o	CPU 1 continues, either freeing a data structure while
	CPU 0 is still referencing it, or, in Jens's case, completing
	an I/O barrier while there is still outstanding I/O.

I agree with Alan -- unless I am missing something, we need a
synchronize_sched() in synchronize_xxx().  One thing missing in
the I/O-barrier case might be the possible restrictions I call
out in my earlier email.

						Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/