linux-kernel - Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160414142357.GB7090@lerouge>
Date:	Thu, 14 Apr 2016 16:23:59 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Rik van Riel <riel@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in
 vtime_init_idle()

On Thu, Apr 14, 2016 at 03:41:17PM +0200, Sebastian Andrzej Siewior wrote:
> On 04/14/2016 03:18 PM, Frederic Weisbecker wrote:
> > I think I tried this but I got lockdep warnings because the other updaters
> > happen on IRQ.
> 
> I booted and played a little but lockdep didn't scream here so far.

You need to select CONFIG_NO_HZ_FULL and select some CPU in nohz_full=
in order to trigger it:

[    0.759869] =================================
[    0.764220] [ INFO: inconsistent lock state ]
[    0.768571] 4.6.0-rc2+ #249 Not tainted
[    0.772404] ---------------------------------
[    0.776755] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[    0.782751] cpuhp/7/49 [HC1[1]:SC0[0]:HE0:SE1] takes:
[    0.787795]  (&p->vtime_seqcount){?.+...}, at: [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[    0.795867] {HARDIRQ-ON-W} state was registered at:
[    0.800737]   [<ffffffff810fab45>] __lock_acquire+0x895/0x1d00
[    0.806579]   [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[    0.812075]   [<ffffffff810df0a0>] vtime_init_idle+0x40/0x80
[    0.817742]   [<ffffffff810daf9e>] init_idle+0x2ee/0x380
[    0.823063]   [<ffffffff810a495e>] fork_idle+0xae/0xc0
[    0.828213]   [<ffffffff81f7a0f4>] idle_threads_init+0x6b/0xa4
[    0.834054]   [<ffffffff81f7e1c5>] smp_init+0xf/0x82
[    0.839028]   [<ffffffff81f5b03c>] kernel_init_freeable+0x9b/0x1ab
[    0.845217]   [<ffffffff8193c6be>] kernel_init+0xe/0x100
[    0.850541]   [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[    0.856034] irq event stamp: 28
[    0.859173] hardirqs last  enabled at (27): [<ffffffff8194a9ff>] _raw_spin_unlock_irqrestore+0x5f/0x80
[    0.868482] hardirqs last disabled at (28): [<ffff88021652bcc8>] 0xffff88021652bcc8
[    0.876143] softirqs last  enabled at (0): [<ffffffff810a2f76>] copy_process.part.38+0x336/0x1c40
[    0.885020] softirqs last disabled at (0): [<          (null)>]           (null)
[    0.892420] 
[    0.892420] other info that might help us debug this:
[    0.898940]  Possible unsafe locking scenario:
[    0.898940] 
[    0.904849]        CPU0
[    0.907294]        ----
[    0.909737]   lock(&p->vtime_seqcount);
[    0.913605]   <Interrupt>
[    0.916222]     lock(&p->vtime_seqcount);
[    0.920261] 
[    0.920261]  *** DEADLOCK ***
[    0.920261] 
[    0.926173] 1 lock held by cpuhp/7/49:
[    0.929915]  #0:  (smpboot_threads_lock){+.+.+.}, at: [<ffffffff810cec9d>] smpboot_unpark_threads+0x1d/0x90
[    0.939735] 
[    0.939735] stack backtrace:
[    0.944087] CPU: 7 PID: 49 Comm: cpuhp/7 Not tainted 4.6.0-rc2+ #249
[    0.950433] Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
[    0.958250]  0000000000000000 ffff88021fbc3d80 ffffffff8141d403 ffff8802165243c0
[    0.965704]  ffffffff827eebd0 ffff88021fbc3dd0 ffffffff811a1743 0000000000000000
[    0.973159]  0000000000000000 0000000000000001 0000000000000000 ffff880216524bc0
[    0.980613] Call Trace:
[    0.983056]  <IRQ>  [<ffffffff8141d403>] dump_stack+0x85/0xc2
[    0.988813]  [<ffffffff811a1743>] print_usage_bug+0x1f3/0x208
[    0.994549]  [<ffffffff810f9c69>] mark_lock+0x649/0x6c0
[    0.999768]  [<ffffffff810f8e50>] ? check_usage_backwards+0x140/0x140
[    1.006199]  [<ffffffff810fad2b>] __lock_acquire+0xa7b/0x1d00
[    1.011938]  [<ffffffff810fc8ca>] lock_acquire+0xca/0x230
[    1.017328]  [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[    1.022548]  [<ffffffff810ded00>] vtime_gen_account_irq_exit+0x40/0xb0
[    1.029072]  [<ffffffff810ad0bb>] ? irq_exit+0xeb/0x110
[    1.034292]  [<ffffffff810ad0bb>] irq_exit+0xeb/0x110
[    1.039335]  [<ffffffff8103b303>] smp_irq_work_interrupt+0x33/0x40
[    1.045506]  [<ffffffff8194cd21>] irq_work_interrupt+0x91/0xa0
[    1.051330]  <EOI>  [<ffffffff8194aa04>] ? _raw_spin_unlock_irqrestore+0x64/0x80
[    1.058734]  [<ffffffff810d69b9>] try_to_wake_up+0x49/0x630
[    1.064299]  [<ffffffff8194a9ea>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[    1.071084]  [<ffffffff810cec80>] ? smpboot_create_threads+0x90/0x90
[    1.077429]  [<ffffffff810d7ea0>] wake_up_state+0x10/0x20
[    1.082819]  [<ffffffff810caf47>] __kthread_unpark+0x67/0x70
[    1.088472]  [<ffffffff810caf70>] kthread_unpark+0x20/0x30
[    1.093958]  [<ffffffff810cecea>] smpboot_unpark_threads+0x6a/0x90
[    1.100128]  [<ffffffff810a68f9>] cpuhp_invoke_callback+0x69/0x2e0
[    1.106302]  [<ffffffff810a6bb3>] cpuhp_up_callbacks+0x43/0xb0
[    1.112133]  [<ffffffff810a6dba>] cpuhp_thread_fun+0xda/0xf0
[    1.117784]  [<ffffffff810ce644>] smpboot_thread_fn+0x1e4/0x300
[    1.123696]  [<ffffffff810ce460>] ? sort_range+0x30/0x30
[    1.128999]  [<ffffffff810cabfa>] kthread+0x10a/0x120
[    1.134044]  [<ffffffff8194b4f2>] ret_from_fork+0x22/0x50
[    1.139434]  [<ffffffff810caaf0>] ? kthread_create_on_node+0x200/0x200


> 
> > In practive we can't be interrupted since the idle task hasn't even started.
> > 
> > Now we can probably get rid of the write_seqcount_stuff here because what
> > is initialized there is supposed to be visible by the idle task once it runs.
> > 
> > But I'm more worried about readers.
> 
> Most idle_thread_get() callers happen at boottime. Then I see two
> during CPU-up (cpu-hotplug) at runtime. One from _cpu_up() via
> idle_thread_get() and the second via bringup_cpu() and the second in
> bringup_cpu()

Right, it's on CPU bootup/hotplug. I'm just worried that, if we erase that lock on
CPU (re-)initialization, the readers (ie: task_cputime()) may read unlucky sequences.

Now probably there is not much opportunity to call task_cputime() on secondary idle
tasks, they don't seem to be in /proc but perhaps they are accessible elsewhere.

I'd rather be paranoid and keep the code that way until we have the final
proof that it's not dangerous.

That said we could skip this initialization based on context tracking/vtime static key,
using vtime_accounting_enabled().

> 
> Sebastian