[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190619095043.GT3402@hirez.programming.kicks-ass.net>
Date: Wed, 19 Jun 2019 11:50:43 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-rt-users <linux-rt-users@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Clark Williams <williams@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>, jack@...e.com,
Waiman Long <longman@...hat.com>,
Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [RT WARNING] DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) !=
current) with fsfreeze (4.19.25-rt16)
Sorry, I seem to have missed this email.
On Mon, May 06, 2019 at 06:50:09PM +0200, Oleg Nesterov wrote:
> On 05/03, Peter Zijlstra wrote:
> >
> > -static void lockdep_sb_freeze_release(struct super_block *sb)
> > -{
> > - int level;
> > -
> > - for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
> > - percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
> > -}
> > -
> > -/*
> > - * Tell lockdep we are holding these locks before we call ->unfreeze_fs(sb).
> > - */
> > -static void lockdep_sb_freeze_acquire(struct super_block *sb)
> > -{
> > - int level;
> > -
> > - for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> > - percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
> > + percpu_down_write_non_owner(sb->s_writers.rw_sem + level-1);
> > }
>
> I'd suggest to not change fs/super.c, keep these helpers, and even not introduce
> xxx_write_non_owner().
>
> freeze_super() takes other locks, it calls sync_filesystem(), freeze_fs(), lockdep
> should know that this task holds SB_FREEZE_XXX locks for writing.
Bah, I so hate these games. But OK, I suppose.
> > @@ -80,14 +83,8 @@ int __percpu_down_read(struct percpu_rw_
> > * and reschedule on the preempt_enable() in percpu_down_read().
> > */
> > preempt_enable_no_resched();
> > -
> > - /*
> > - * Avoid lockdep for the down/up_read() we already have them.
> > - */
> > - __down_read(&sem->rw_sem);
> > + wait_event(sem->waiters, !atomic_read(&sem->block));
> > this_cpu_inc(*sem->read_count);
>
> Argh, this looks racy :/
>
> Suppose that sem->block == 0 when wait_event() is called, iow the writer released
> the lock.
>
> Now suppose that this __percpu_down_read() races with another percpu_down_write().
> The new writer can set sem->block == 1 and call readers_active_check() in between,
> after wait_event() and before this_cpu_inc(*sem->read_count).
CPU0 CPU1 CPU2
percpu_up_write()
sem->block = 0;
__percpu_down_read()
wait_event(, !sem->block);
percpu_down_write()
wait_event_exclusive(, xchg(sem->block,1)==0);
readers_active_check()
this_cpu_inc();
*whoopsy* reader while write owned.
I suppose we can 'patch' that by checking blocking again after we've
incremented, something like the below.
But looking at percpu_down_write() we have two wait_event*() on the same
queue back to back, which is 'odd' at best. Let me ponder that a little
more.
---
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -61,6 +61,7 @@ int __percpu_down_read(struct percpu_rw_
* writer missed them.
*/
+again:
smp_mb(); /* A matches D */
/*
@@ -87,7 +88,13 @@ int __percpu_down_read(struct percpu_rw_
wait_event(sem->waiters, !atomic_read_acquire(&sem->block));
this_cpu_inc(*sem->read_count);
preempt_disable();
- return 1;
+
+ /*
+ * percpu_down_write() could've set ->blocked right after we've seen it
+ * 0 but missed our this_cpu_inc(), which is exactly the condition we
+ * get called for from percpu_down_read().
+ */
+ goto again;
}
EXPORT_SYMBOL_GPL(__percpu_down_read);
Powered by blists - more mailing lists