lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Aug 2023 08:45:06 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     paulmck@...nel.org, Roy Hopkins <rhopkins@...e.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        Pavel Machek <pavel@...x.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, patches@...ts.linux.dev,
        linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
        akpm@...ux-foundation.org, shuah@...nel.org, patches@...nelci.org,
        lkft-triage@...ts.linaro.org, jonathanh@...dia.com,
        f.fainelli@...il.com, sudipm.mukherjee@...il.com,
        srw@...dewatkins.net, rwarsow@....de, conor@...nel.org,
        rcu@...r.kernel.org, Ingo Molnar <mingo@...nel.org>
Subject: Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227]
 6.4.7-rc1 review)

On 8/2/23 08:05, Paul E. McKenney wrote:
> On Wed, Aug 02, 2023 at 02:57:56PM +0100, Roy Hopkins wrote:
>> On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote:
>>> On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote:
>>>
>>>
>>> Please see below for my preferred fix.  Does this work for you guys?
>>>
>>> Back to figuring out why recent kernels occasionally to blow up all
>>> rcutorture guest OSes...
>>>
>>>                                                          Thanx, Paul
>>>
>>> ------------------------------------------------------------------------
>>>
>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>> index 7294be62727b..2d5b8385c357 100644
>>> --- a/kernel/rcu/tasks.h
>>> +++ b/kernel/rcu/tasks.h
>>> @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
>>>          if (unlikely(midboot)) {
>>>                  needgpcb = 0x2;
>>>          } else {
>>> +               mutex_unlock(&rtp->tasks_gp_mutex);
>>>                  set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
>>>                  rcuwait_wait_event(&rtp->cbs_wait,
>>>                                     (needgpcb = rcu_tasks_need_gpcb(rtp)),
>>>                                     TASK_IDLE);
>>> +               mutex_lock(&rtp->tasks_gp_mutex);
>>>          }
>>>   
>>>          if (needgpcb & 0x2) {
>>
>> Your preferred fix looks good to me.
>>
>> With the original code I can quite easily reproduce the problem on my
>> system every 10 reboots or so. With your fix in place the problem no
>> longer occurs.
> 
> Very good, thank you!  May I add your Tested-by?
> 

FWIW, I am still working on it. So far I get

[    8.191589]     KTAP version 1
[    8.191769]     # Subtest: kunit_executor_test
[    8.191972]     # module: kunit
[    8.192012]     1..8
[    8.197643]     ok 1 parse_filter_test
[    8.201851]     ok 2 filter_suites_test
[    8.206713]     ok 3 filter_suites_test_glob_test
[    8.211806]     ok 4 filter_suites_to_empty_test
[    8.214077] kunit executor: filter operation not found: speed>slow, module!=example
[    8.217933]     # parse_filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:126
[    8.217933]     Expected err == 0, but
[    8.217933]         err == -22 (0xffffffffffffffea)
[    8.217933]
[    8.217933] failed to parse filter '(efault)'
[    8.221266]     not ok 5 parse_filter_attr_test
[    8.224224] kunit executor: filter operation not found: speed>slow
[    8.225837]     # filter_attr_test: ASSERTION FAILED at lib/kunit/executor_test.c:165
[    8.225837]     Expected err == 0, but
[    8.225837]         err == -22 (0xffffffffffffffea)
[    8.228850]     not ok 6 filter_attr_test
[    8.230942] kunit executor: filter operation not found: module!=dummy
[    8.232167]     # filter_attr_empty_test: ASSERTION FAILED at lib/kunit/executor_test.c:190
[    8.232167]     Expected err == 0, but
[    8.232167]         err == -22 (0xffffffffffffffea)
[    8.235317]     not ok 7 filter_attr_empty_test
[    8.237065] kunit executor: filter operation not found: speed>slow
[    8.238796]     # filter_attr_skip_test: ASSERTION FAILED at lib/kunit/executor_test.c:209
[    8.238796]     Expected err == 0, but
[    8.238796]         err == -22 (0xffffffffffffffea)
[    8.241897]     not ok 8 filter_attr_skip_test
[    8.241947] # kunit_executor_test: pass:4 fail:4 skip:0 total:8
[    8.242144] # Totals: pass:4 fail:4 skip:0 total:8

and it looks like the console no longer works. Most likely this is some other problem
that was introduced while tests were broken. It will take me some time to track that down.

Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ