lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160603033951.GO5231@linux.vnet.ibm.com>
Date:	Thu, 2 Jun 2016 20:39:51 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Daniel Bristot de Oliveira <bristot@...hat.com>
Cc:	linux-kernel@...r.kernel.org, Jonathan Corbet <corbet@....net>,
	Josh Triplett <josh@...htriplett.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	"Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
Subject: Re: [PATCH v2] rcu: sysctl: Panic on RCU Stall

On Thu, Jun 02, 2016 at 01:51:41PM -0300, Daniel Bristot de Oliveira wrote:
> It is not always easy to define the cause of an RCU stall just by
> analysing the RCU stall messages, mainly when the problem is caused
> by the indirect starvation of rcu threads. For example, when preempt_rcu
> is not awakened due to the starvation of a timer softirq.
> 
> We have been hard coding panic() in the RCU stall functions for
> some time while testing the kernel-rt. But this is not possible in
> some scenarios, like when supporting customers.
> 
> This patch implements the sysctl kernel.panic_on_rcu_stall. If
> set to 1, the system will panic() when an RCU stall takes place,
> enabling the capture of a vmcore. The vmcore provides a way to analyze
> all kernel/tasks states, helping out to point to the culprit and the
> solution for the stall.
> 
> The kernel.panic_on_rcu_stall sysctl is disabled by default.
> 
> Changes from v1:
> - Fixed a typo in the git log
> - The if(sysctl_panic_on_rcu_stall) panic() is in a static function
> - Fixed the CONFIG_TINY_RCU compilation issue
> - The var sysctl_panic_on_rcu_stall is now __read_mostly
> 
> Cc: Jonathan Corbet <corbet@....net>
> Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> Cc: Josh Triplett <josh@...htriplett.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Lai Jiangshan <jiangshanlai@...il.com>
> Acked-by: Christian Borntraeger <borntraeger@...ibm.com>
> Reviewed-by: Josh Triplett <josh@...htriplett.org>
> Reviewed-by: Arnaldo Carvalho de Melo <acme@...nel.org>
> Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
> Signed-off-by: Daniel Bristot de Oliveira <bristot@...hat.com>

Queued for testing and further review.

							Thanx, Paul

> ---
>  Documentation/sysctl/kernel.txt | 12 ++++++++++++
>  include/linux/kernel.h          |  1 +
>  kernel/rcu/tree.c               | 12 ++++++++++++
>  kernel/sysctl.c                 | 11 +++++++++++
>  4 files changed, 36 insertions(+)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index a3683ce..3320460 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -58,6 +58,7 @@ show up in /proc/sys/kernel:
>  - panic_on_stackoverflow
>  - panic_on_unrecovered_nmi
>  - panic_on_warn
> +- panic_on_rcu_stall
>  - perf_cpu_time_max_percent
>  - perf_event_paranoid
>  - perf_event_max_stack
> @@ -618,6 +619,17 @@ a kernel rebuild when attempting to kdump at the location of a WARN().
> 
>  ==============================================================
> 
> +panic_on_rcu_stall:
> +
> +When set to 1, calls panic() after RCU stall detection messages. This
> +is useful to define the root cause of RCU stalls using a vmcore.
> +
> +0: do not panic() when RCU stall takes place, default behavior.
> +
> +1: panic() after printing RCU stall messages.
> +
> +==============================================================
> +
>  perf_cpu_time_max_percent:
> 
>  Hints to the kernel how much CPU time it should be allowed to
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 94aa10f..c420821 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -451,6 +451,7 @@ extern int panic_on_oops;
>  extern int panic_on_unrecovered_nmi;
>  extern int panic_on_io_nmi;
>  extern int panic_on_warn;
> +extern int sysctl_panic_on_rcu_stall;
>  extern int sysctl_panic_on_stackoverflow;
> 
>  extern bool crash_kexec_post_notifiers;
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c7f1bc4..d531988 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -125,6 +125,8 @@ int rcu_num_lvls __read_mostly = RCU_NUM_LVLS;
>  /* Number of rcu_nodes at specified level. */
>  static int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
>  int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
> +/* panic() on RCU Stall sysctl. */
> +int sysctl_panic_on_rcu_stall __read_mostly;
> 
>  /*
>   * The rcu_scheduler_active variable transitions from zero to one just
> @@ -1311,6 +1313,12 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
>  	}
>  }
> 
> +static inline void panic_on_rcu_stall(void)
> +{
> +	if (sysctl_panic_on_rcu_stall)
> +		panic("RCU Stall\n");
> +}
> +
>  static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>  {
>  	int cpu;
> @@ -1390,6 +1398,8 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
> 
>  	rcu_check_gp_kthread_starvation(rsp);
> 
> +	panic_on_rcu_stall();
> +
>  	force_quiescent_state(rsp);  /* Kick them all. */
>  }
> 
> @@ -1430,6 +1440,8 @@ static void print_cpu_stall(struct rcu_state *rsp)
>  			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> 
> +	panic_on_rcu_stall();
> +
>  	/*
>  	 * Attempt to revive the RCU machinery by forcing a context switch.
>  	 *
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 87b2fc3..35f0dcb 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1205,6 +1205,17 @@ static struct ctl_table kern_table[] = {
>  		.extra2		= &one,
>  	},
>  #endif
> +#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
> +	{
> +		.procname	= "panic_on_rcu_stall",
> +		.data		= &sysctl_panic_on_rcu_stall,
> +		.maxlen		= sizeof(sysctl_panic_on_rcu_stall),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +#endif
>  	{ }
>  };
> 
> -- 
> 2.5.5
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ