lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Feb 2009 10:50:59 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Damien Wyart <damien.wyart@...e.fr>,
	"Paul E. McKenney" <paulmck@...ibm.com>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>
Subject: Re: [Bug #12650] Strange load average and ksoftirqd behavior with
	2.6.29-rc2-git1


* Damien Wyart <damien.wyart@...e.fr> wrote:

> Hello,
> 
> Ok, I've redone the tests under tip from this morning (Paris time).
> Everything is in http://damien.wyart.free.fr/ksoftirqd_pb/
> 
> * Ingo Molnar <mingo@...e.hu> [2009-02-15 20:31]:
> > Yes, an abstime trace would be useful.
> 
> The corresponding file is trace_tip_2009.02.16_ksoftirqd_pb_abstime.txt.gz
> and there is also a trace without abstime:
> trace_tip_2009.02.16_ksoftirqd_pb.txt.gz

hm, we need a trace with both abstime and process information included:

  echo funcgraph-proc     > trace_options
  echo funcgraph-abstime  > trace_options

Also, at 140 msecs the duration is a bit short - could you please make a 
1-2 seconds capture? You can do that by increasing the number in 
buffer_size_kb 10-fold:

   echo 14100 > buffer_size_kb 

(your defaults might be different)

You can see the duration of the trace by looking at the first timestamp 
and the last one:

  310.846260 |   0)   2.380 us    |                            }
  [...]
  457.712729 |   1)               |        dequeue_task() {

Hm ... even with this limited trace, there's an unusually high amount of 
RCU activities. Each activity goes like this:

  457.680976 |   1)               |  do_softirq() {
  457.680976 |   1)               |    __do_softirq() {
  457.680977 |   1)               |      rcu_process_callbacks() {
  457.680977 |   1)               |        __rcu_process_callbacks() {
4  457.680978 |   1)   0.478 us    |          force_quiescent_state();
  457.680979 |   1)   1.591 us    |        }
  457.680979 |   1)               |        __rcu_process_callbacks() {
  457.680980 |   1)   0.478 us    |          force_quiescent_state();
  457.680981 |   1)               |          cpu_quiet() {
  457.680981 |   1)   0.506 us    |            _spin_lock_irqsave();
  457.680982 |   1)   0.496 us    |            _spin_unlock_irqrestore();
  457.680983 |   1)   2.545 us    |          }
  457.680984 |   1)   4.626 us    |        }
  457.680985 |   1)   7.823 us    |      }
  457.680985 |   1)   0.496 us    |      _local_bh_enable();
  457.680986 |   1)   9.845 us    |    }
  457.680987 |   1) + 10.962 us   |  }

I've Cc:-ed Paul, as you have tree-RCU enabled, which is a new feature 
in v2.6.29:

 #
 # RCU Subsystem
 #
 # CONFIG_CLASSIC_RCU is not set
 CONFIG_TREE_RCU=y
 # CONFIG_PREEMPT_RCU is not set
 # CONFIG_RCU_TRACE is not set
 CONFIG_RCU_FANOUT=32
 # CONFIG_RCU_FANOUT_EXACT is not set
 # CONFIG_TREE_RCU_TRACE is not set
 # CONFIG_PREEMPT_RCU_TRACE is not set

Damien, as an experiment, if you change your config to 
CONFIG_CLASSIC_RCU=y, does the ksoftirqd problem go away?

On the other hand ... the softirq processing there looks anomalous in 
itself, and might be due to some compiler bug perhaps. Could you try the 
debug patch below please (you'll get it automatically if you update to 
latest -tip), and redo the trace - the ftrace_printk() info should now 
be embedded in the trace. The expected result would be for the printed 
ot value to be non-zero at the #1 point, and zero at the #2 point.

Note: if it's a compiler optimization bug then this patch might make the 
whole problem go away.

	Ingo

------------------>
>From 2d7cf65eec92937bff1073311f6843aa7189bff2 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...e.hu>
Date: Mon, 16 Feb 2009 10:48:37 +0100
Subject: [PATCH] softirq: debug

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/softirq.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 3dd0d13..110cad0 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -196,7 +196,9 @@ asmlinkage void __do_softirq(void)
 	cpu = smp_processor_id();
 restart:
 	/* Reset the pending bitmask before enabling irqs */
+	ftrace_printk("#1 softirq pending: %08x\n", local_softirq_pending());
 	set_softirq_pending(0);
+	ftrace_printk("#2 softirq pending: %08x\n", local_softirq_pending());
 
 	local_irq_enable();
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ