[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zc1zvBpL6x1kpsfk@bombadil.infradead.org>
Date: Wed, 14 Feb 2024 18:15:24 -0800
From: Luis Chamberlain <mcgrof@...nel.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: kernel test robot <oliver.sang@...el.com>,
Daniel Gomez <da.gomez@...sung.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org,
"gost.dev@...sung.com" <gost.dev@...sung.com>,
Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH 1/2] test_xarray: add tests for advanced multi-index use
On Fri, Jan 26, 2024 at 08:32:28PM +0000, Matthew Wilcox wrote:
> On Fri, Jan 26, 2024 at 12:04:44PM -0800, Luis Chamberlain wrote:
> > > We have a perfectly good system for "relaxing":
> > >
> > > xas_for_each_marked(&xas, page, end, PAGECACHE_TAG_DIRTY) {
> > > xas_set_mark(&xas, PAGECACHE_TAG_TOWRITE);
> > > if (++tagged % XA_CHECK_SCHED)
> > > continue;
> > >
> > > xas_pause(&xas);
> > > xas_unlock_irq(&xas);
> > > cond_resched();
> > > xas_lock_irq(&xas);
> > > }
> >
> > And yet we can get a soft lockup with order 20 (1,048,576 entries),
> > granted busy looping over 1 million entries is insane, but it seems it
> > the existing code may not be enough to avoid the soft lockup. Also
> > cond_resched() may be eventually removed [0].
>
> what? you're in charge of when you sleep. you can do this:
>
> unsigned i = 0;
> rcu_read_lock();
> xas_for_each(...) {
> ...
> if (iter++ % XA_CHECK_SCHED)
> continue;
> xas_pause();
> rcu_read_unlock();
> rcu_read_lock();
> }
> rcu_read_unlock();
>
> and that will get rid of the rcu warnings. right?
The RCU splat is long gone on my last iteration merged now on
linux-next, what's left is just a soft lockup over 22 seconds when you
enable disable preemption and enable RCU prooving and use 2 vcpus. This
could happen for instance if we loop over test_get_entry() and don't
want to use xas_for_each() API, in this case we don't as part of the
selftest is to not trust the xarray API and test it.
So in the simplest case for instance, this is used:
check_xa_multi_store_adv_add(xa, base, order, &some_val);
for (i = 0; i < nrpages; i++)
XA_BUG_ON(xa, test_get_entry(xa, base + i) != &some_val);
test_get_entry() will do the RCU locking for us. So while I agree that
if you are using the xarray API using xas_for_each*() is best, we want
to not trust the xarray API and prove it. So what do you think about
something like this, as it does fix the soft lockup.
diff --git a/lib/test_xarray.c b/lib/test_xarray.c
index d4e55b4867dc..ac162025cc59 100644
--- a/lib/test_xarray.c
+++ b/lib/test_xarray.c
@@ -781,6 +781,7 @@ static noinline void *test_get_entry(struct xarray *xa, unsigned long index)
{
XA_STATE(xas, xa, index);
void *p;
+ static unsigned int i = 0;
rcu_read_lock();
repeat:
@@ -790,6 +791,17 @@ static noinline void *test_get_entry(struct xarray *xa, unsigned long index)
goto repeat;
rcu_read_unlock();
+ /*
+ * This is not part of the page cache, this selftest is pretty
+ * aggressive and does not want to trust the xarray API but rather
+ * test it, and for order 20 (4 GiB block size) we can loop over
+ * over a million entries which can cause a soft lockup. Page cache
+ * APIs won't be stupid, proper page cache APIs loop over the proper
+ * order so when using a larger order we skip shared entries.
+ */
+ if (++i % XA_CHECK_SCHED == 0)
+ schedule();
+
return p;
}
Powered by blists - more mailing lists