linux-kernel - Re: [BUG] Re: Linux 6.4.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8682b08c-347b-5547-60e0-013dcf1f8c93@joelfernandes.org>
Date:   Thu, 20 Jul 2023 15:32:35 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     paulmck@...nel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        torvalds@...ux-foundation.org, stable@...r.kernel.org, lwn@....net,
        jslaby@...e.cz, rcu@...r.kernel.org
Subject: Re: [BUG] Re: Linux 6.4.4

On 7/20/23 15:04, Paul E. McKenney wrote:
 > On Thu, Jul 20, 2023 at 12:31:13PM -0400, Joel Fernandes wrote:
 >> Hi Paul,
 >>
 >> On Thu, Jul 20, 2023 at 11:55 AM Paul E. McKenney <paulmck@...nel.org> wrote:
 >>>
 >>> On Thu, Jul 20, 2023 at 01:27:14PM +0000, Joel Fernandes wrote:
[...]
 >>>>
 >>>> So likely RCU boosting is failing:
 >>>>
 >>>> The full TREE03 splat:
 >>>> [   54.243588] ------------[ cut here ]------------
 >>>> [   54.244547] rcu-torture: rcu_torture_boost started
[...]
 >>>> [   54.300499] RIP: 0010:rcu_torture_stats_print+0x5b2/0x620
[...]
 >>>> [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
 >>>>
 >>>> However, if we are to believe the '9', it appears the object did made it
 >>>> quite some till the end of the pipe array but not until the free pool.
 >>>
 >>> This is from this if/for statement, correct?
 >>>
 >>>                  stutter_waited = stutter_wait("rcu_torture_writer");
 >>>                  if (stutter_waited &&
 >>>                      !atomic_read(&rcu_fwd_cb_nodelay) &&
 >>>                      !cur_ops->slow_gps &&
 >>>                      !torture_must_stop() &&
 >>>                      boot_ended)
 >>>                          for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
 >>>                                  if (list_empty(&rcu_tortures[i].rtort_free) &&
 >>>                                      rcu_access_pointer(rcu_torture_current) !=
 >>>                                      &rcu_tortures[i]) {
 >>>                                          tracing_off();
 >>>                                          show_rcu_gp_kthreads();
 >>>                                          WARN(1, "%s: rtort_pipe_count:
 >>>                                          rcu_ftrace_dump(DUMP_ALL);
 >>>                                  }
 >>
 >> Yes, that's right.
 >>
 >>> If so, this happens when there was a stutter wait, but RCU grace
 >>> periods failed to clear out the backlog during the several seconds that
 >>> rcutorture was forced idle.  This might be related to the RCU priority
 >>> boosting failure, in which a preempted reader persisted across the
 >>> stutter interval.
 >>
 >> When RCU is operating normally, shouldn't the check
 >> "(list_empty(&rcu_tortures[i].rtort_free)" not run until the preempted
 >> reader unblocks and exits its RCU read-side critical section?
 >
 > Yes, but not just "until", but rather "long after".  If RCU is doing
 > grace periods correctly, an active reader on a given rcu_tortures[]
 > element will prevent .rtort_pipe_count from exceeding the value 2.

Ah ok, so the rtort_pipe_count being 9 is a sign RCU isn't making progress thus 
making it absent from the free list.

 >
 > The element will not be put on a list until .rtort_pipe_count is equal
 > to RCU_TORTURE_PIPE_LEN, which is 10.
 >
 > This warning usually appears when something is holding up the grace-period
 > kthread.  Historically, this has included deadlocks, missed timers,
 > and whatever else can prevent the grace-period kthread from running.

Makes sense.

 >> One thing that confuses me, in the case of
 >> "cur_ops->deferred_free(old_rp);" , the earlier do-while loop may exit
 >> before the async callbacks can finish. So what prevents the
 >> "(list_empty(&rcu_tortures[i].rtort_free)" check from happening before
 >> grace periods happen? Thanks for any clarification.
 >
 > We only enter this code if the stutter_wait() actually waited, and by
 > default this function will wait about five seconds.  Since the rcutorture
 > testing goes idle during this time period (or is supposed to!), if things
 > are working properly, knocking off ten grace periods during that time
 > should be pretty much a given.

Sure, makes sense. And this is not Lazy-RCU so 5 seconds should be plenty ;). I 
think I was subconsciously expecting an rcu_barrier() somewhere in the code 
before those checks, but that's not needed as you pointed that the stutter 
should be giving enough time for RCU to make progress.

So hmm, the count being 9 means that not enough RCU grace periods have passed 
for the rcu_torture object in question thus keeping it always allocated. The GP 
thread not getting CPU can do that indeed, or perhaps something else stalling 
RCU like a preempted reader, length preemption disabling on a CPU and so forth.. 
  I'll try to collect a trace when it happens.

thanks,

  - Joel