lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b31ead5b-d410-4b34-b580-81af6fabb4d0@paulmck-laptop>
Date:   Wed, 2 Aug 2023 10:20:15 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Roy Hopkins <rhopkins@...e.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Pavel Machek <pavel@...x.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, patches@...ts.linux.dev,
        linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
        akpm@...ux-foundation.org, shuah@...nel.org, patches@...nelci.org,
        lkft-triage@...ts.linaro.org, jonathanh@...dia.com,
        f.fainelli@...il.com, sudipm.mukherjee@...il.com,
        srw@...dewatkins.net, rwarsow@....de, conor@...nel.org,
        rcu@...r.kernel.org, Ingo Molnar <mingo@...nel.org>
Subject: Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227]
 6.4.7-rc1 review)

On Wed, Aug 02, 2023 at 08:45:06AM -0700, Guenter Roeck wrote:
> On 8/2/23 08:05, Paul E. McKenney wrote:
> > On Wed, Aug 02, 2023 at 02:57:56PM +0100, Roy Hopkins wrote:
> > > On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote:
> > > > 
> > > > 
> > > > Please see below for my preferred fix.  Does this work for you guys?
> > > > 
> > > > Back to figuring out why recent kernels occasionally to blow up all
> > > > rcutorture guest OSes...
> > > > 
> > > >                                                          Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > > index 7294be62727b..2d5b8385c357 100644
> > > > --- a/kernel/rcu/tasks.h
> > > > +++ b/kernel/rcu/tasks.h
> > > > @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
> > > >          if (unlikely(midboot)) {
> > > >                  needgpcb = 0x2;
> > > >          } else {
> > > > +               mutex_unlock(&rtp->tasks_gp_mutex);
> > > >                  set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> > > >                  rcuwait_wait_event(&rtp->cbs_wait,
> > > >                                     (needgpcb = rcu_tasks_need_gpcb(rtp)),
> > > >                                     TASK_IDLE);
> > > > +               mutex_lock(&rtp->tasks_gp_mutex);
> > > >          }
> > > >          if (needgpcb & 0x2) {
> > > 
> > > Your preferred fix looks good to me.
> > > 
> > > With the original code I can quite easily reproduce the problem on my
> > > system every 10 reboots or so. With your fix in place the problem no
> > > longer occurs.
> > 
> > Very good, thank you!  May I add your Tested-by?
> > 
> 
> FWIW, I am still working on it. So far I get
> 
> [    8.191589]     KTAP version 1
> [    8.191769]     # Subtest: kunit_executor_test
> [    8.191972]     # module: kunit
> [    8.192012]     1..8
> [    8.197643]     ok 1 parse_filter_test
> [    8.201851]     ok 2 filter_suites_test
> [    8.206713]     ok 3 filter_suites_test_glob_test
> [    8.211806]     ok 4 filter_suites_to_empty_test
> [    8.214077] kunit executor: filter operation not found: speed>slow, module!=example
> [    8.217933]     # parse_filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:126
> [    8.217933]     Expected err == 0, but
> [    8.217933]         err == -22 (0xffffffffffffffea)
> [    8.217933]
> [    8.217933] failed to parse filter '(efault)'
> [    8.221266]     not ok 5 parse_filter_attr_test
> [    8.224224] kunit executor: filter operation not found: speed>slow
> [    8.225837]     # filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:165
> [    8.225837]     Expected err == 0, but
> [    8.225837]         err == -22 (0xffffffffffffffea)
> [    8.228850]     not ok 6 filter_attr_test
> [    8.230942] kunit executor: filter operation not found: module!=dummy
> [    8.232167]     # filter_attr_empty_test: ASSERTION FAILED at lib/kunit/executor_test.c:190
> [    8.232167]     Expected err == 0, but
> [    8.232167]         err == -22 (0xffffffffffffffea)
> [    8.235317]     not ok 7 filter_attr_empty_test
> [    8.237065] kunit executor: filter operation not found: speed>slow
> [    8.238796]     # filter_attr_skip_test: ASSERTION FAILED at lib/kunit/executor_test.c:209
> [    8.238796]     Expected err == 0, but
> [    8.238796]         err == -22 (0xffffffffffffffea)
> [    8.241897]     not ok 8 filter_attr_skip_test
> [    8.241947] # kunit_executor_test: pass:4 fail:4 skip:0 total:8
> [    8.242144] # Totals: pass:4 fail:4 skip:0 total:8
> 
> and it looks like the console no longer works. Most likely this is some other problem
> that was introduced while tests were broken. It will take me some time to track that down.

No rush.

Given that this bug is a year old, that it happens only when debug
options are enabled, and that it has only been seen in current -next,
my plan is to submit it into the next merge window.

So this one stays mutable for about another 10 days.

On the strength of Roy's Tested-by, however, I will push this patch into
-next soon, so that should make things a bit easier.  Or so I hope.

And again, thank you all for tracking this down!

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ