linux-kernel - RE: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW5PR84MB1842933AB81EECFAF2ECCE7DAB3C9@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date:   Mon, 7 Nov 2022 02:10:35 +0000
From:   "Elliott, Robert (Servers)" <elliott@....com>
To:     "paulmck@...nel.org" <paulmck@...nel.org>,
        Zhen Lei <thunder.leizhen@...wei.com>
CC:     Frederic Weisbecker <frederic@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Josh Triplett <josh@...htriplett.org>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v4 4/4] rcu: Add RCU stall diagnosis information


> > > +config RCU_CPU_STALL_CPUTIME
> >
> > Since the information might change and grow over time, consider
> > calling it "ADDITIONAL_INFO" rather than just "CPUTIME".
> 
> Except that I already redirected Zhen Lei from a generic description
> to a more specific one.  The reason for this is that I already get
> complaints about the large volume of output from RCU CPU stall warnings,
> which suggests that unless the additional information is quite small,
> it should get its own Kconfig option and kernel boot parameter.
> 
> So we well be keeping RCU_CPU_STALL_CPUTIME.

No problem - CPUTIME is easier to search for, and will likely
always remain as one of the values reported.

> But please tie any such patch to an actual use case.  After all, if no
> one actually uses that additional information, we have irritates untold
> numbers of electrons for no purpose.

I've been working on fixing intermittent RCU stalls that turned out to be
caused by x86-optimized crypto modules holding kernel_fpu_begin()/end()
too long and crypto tests not calling cond_resched() often enough, so am
interested in features that will help identify and prevent such
problems.

> Also, some of those functions are on fastpaths, so adding unconditional
> instrumentation overhead might result in an objection or three.
> 
> As always, choose wisely!  ;-)
> 
> If I don't see anything from you by this coming Friday, I will fold my
> usual wordsmithing into the patch.

I applied the series to a tree not including any crypto module
fixes, but an overnight run didn't trigger any RCU stalls. So, I
modified the tcrypt test module to simulate a problem by
running a lengthy loop after kernel_fpu_begin().  
Here's an example of how the output looks:


[ 1816.698380] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 1816.704368] rcu:     0-....: (2999 ticks this GP) idle=1dbc/1/0x4000000000000000 softirq=39683/39683 fqs=751
[ 1816.714200] rcu:          hardirqs   softirqs   csw/system
[ 1816.719922] rcu:  number:        5         10            0
[ 1816.725643] rcu: cputime:        3          0         1492   ==> 1500(ms)
[ 1816.732669]  (t=3030 jiffies g=89765 q=308 ncpus=56)
[ 1816.737857] CPU: 0 PID: 46826 Comm: modprobe Tainted: G        W          6.0.0+ #5
[ 1816.745754] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/08/2022
[ 1816.754523] RIP: 0010:rude_sleep_cycles.constprop.0+0x1c/0x30 [tcrypt]
[ 1816.761290] Code: 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f 1f 44 00 00 0f 31 48 c1 e2 20 be ab 90 41 00 48 89 d1 48 c1 e6 0b 48 09 c1 0f 31 <48> c1 e2 20 48 09 c2 48 29 ca 48 39 d6 73 ef c3 cc cc cc cc 48 8b
...

That makes me realize what "csw/switch" means:
- "csw" context switches applies to the number line
- "system" applies the cputime line
 
Maybe this shouldn't be a table? Make it grep-friendly:
[ 1816.719922] rcu: half-timeout counts: hardirq =5 softirq=10 csw=0
[ 1816.725643] rcu: half_timeout cputimes (ms): time=1500 hardirq=3 softirq=0 system=1492

The new prints interfere with existing prints, pushing the "t=3030" line 
further away from its context. That's from a pr_cont. Existing prints after
the "self-detect stall" message all start with \t, so they are roughly
related to the first line. The new prints should probably do the same.

Since the first line ends with \n, the pr_cont will never make it on the
same line, so it might be cleaner to use pr_err instead. That way it'll have
the "rcu:" prefix of the other lines.

That's from:
        pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
        raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
        print_cpu_stall_info(smp_processor_id());
[that's where this patch adds more prints]
        raw_spin_unlock_irqrestore_rcu_node(rdp->mynode, flags);
        for_each_possible_cpu(cpu)
                totqlen += rcu_get_n_cbs_cpu(cpu);
        pr_cont("\t(t=%lu jiffies g=%ld q=%lu ncpus=%d)\n",
                jiffies - gps,
                (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);


One other print has similar construction:
        pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
        ...
                                        print_cpu_stall_info(cpu);
        ...
        pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu ncpus=%d)\n",
               smp_processor_id(), (long)(jiffies - gps),
               (long)rcu_seq_current(&rcu_state.gp_seq), totqlen, rcu_state.n_online_cpus);