[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121030184800.GA16129@redhat.com>
Date: Tue, 30 Oct 2012 19:48:00 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Mikulas Patocka <mpatocka@...hat.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Ananth N Mavinakayanahalli <ananth@...ibm.com>,
Anton Arapov <anton@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] fix and improvements for percpu-rw-semaphores
(was: brw_mutex: big read-write mutex)
On 10/22, Mikulas Patocka wrote:
>
> > > Ooooh. And I just noticed include/linux/percpu-rwsem.h which does
> > > something similar. Certainly it was not in my tree when I started
> > > this patch... percpu_down_write() doesn't allow multiple writers,
> > > but the main problem it uses msleep(1). It should not, I think.
But, since we already have percpu_rw_semaphore, I do not think I can
add another similar thing,
However percpu_rw_semaphore is sub-optimal, not sure uprobes can use
it to block dup_mmap(). Perhaps we can improve it?
> > > But. It seems that percpu_up_write() is equally wrong? Doesn't
> > > it need synchronize_rcu() before "p->locked = false" ?
> > >
> > > (add Mikulas)
> >
> > Mikulas said something about doing an updated patch, so I figured I
> > would look at his next version.
> >
> > Thanx, Paul
>
> The best ideas proposed in this thread are:
>
>
> Using heavy/light barries by Lai Jiangshan.
So. down_write/up_right does msleep() and it needs to call
synchronize_sched() 3 times.
This looks too much. It is not that I am worried about the writers,
the problem is that the new readers are blocked completely while the
writer sleeps in msleep/synchronize_sched.
Paul, Mikulas, et al. Could you please look at the new implementation
below? Completely untested/uncompiled, just for discussion.
Compared to the current implementation, down_read() is still possible
while the writer sleeps in synchronize_sched(), but the reader uses
rw_semaphore/atomic_inc when it detects the waiting writer.
Can this work? Do you think this is better than we have now?
Note: probably we can optimize percpu_down/up_write more, we can
"factor out" synchronize_sched(), multiple writers can do this in
parallel before they take ->writer_mutex to exclude each other.
But this won't affect the readers, and this can be done later.
Oleg.
------------------------------------------------------------------------------
struct percpu_rw_semaphore {
long __percpu *fast_read_ctr;
struct mutex writer_mutex;
struct rw_semaphore rw_sem;
atomit_t slow_read_ctr;
};
static bool update_fast_ctr(struct percpu_rw_semaphore *brw, long val)
{
bool success = false;
preempt_disable();
if (likely(!mutex_is_locked(&brw->writer_mutex))) {
__this_cpu_add(*brw->fast_read_ctr, val);
success = true;
}
preempt_enable();
return success;
}
static long clear_fast_read_ctr(struct percpu_rw_semaphore *brw)
{
long sum = 0;
int cpu;
for_each_possible_cpu(cpu) {
sum += per_cpu(*brw->fast_read_ctr, cpu);
per_cpu(*brw->fast_read_ctr, cpu) = 0;
}
return sum;
}
void percpu_down_read(struct percpu_rw_semaphore *brw)
{
if (likely(update_fast_ctr(+1)))
return;
down_read(&brw->rw_sem);
atomic_inc(&brw->slow_read_ctr);
up_read(&brw->rw_sem);
}
void percpu_up_read(struct percpu_rw_semaphore *brw)
{
if (likely(update_fast_ctr(-1)))
return;
if (atomic_dec_and_test(&brw->slow_read_ctr))
wake_up_all(&brw->write_waitq);
}
void percpu_down_write(struct percpu_rw_semaphore *brw)
{
mutex_lock(&brw->writer_mutex);
/* ensure mutex_is_locked() is visible to the readers */
synchronize_sched();
/* block the new readers */
down_write(&brw->rw_sem);
atomic_add(&brw->slow_read_ctr, clear_fast_read_ctr(brw));
wait_event(brw->write_waitq, !atomic_read(&brw->slow_read_ctr));
}
void percpu_up_write(struct percpu_rw_semaphore *brw)
{
up_write(&brw->rw_sem);
/* insert the barrier before the next fast-path in down_read */
synchronize_sched();
mutex_unlock(&brw->writer_mutex);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists