linux-kernel - Re: [git pull] scheduler/misc fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1209109710.7115.400.camel@twins>
Date:	Fri, 25 Apr 2008 09:48:30 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	David Miller <davem@...emloft.net>
Cc:	mingo@...e.hu, torvalds@...ux-foundation.org,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	viro@...iv.linux.org.uk, alan@...rguk.ukuu.org.uk,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [git pull] scheduler/misc fixes

On Thu, 2008-04-24 at 20:46 -0700, David Miller wrote:
> From: Ingo Molnar <mingo@...e.hu>
> Date: Fri, 25 Apr 2008 00:55:30 +0200
> 
> > 
> > Linus, please pull the latest scheduler/misc fixes git tree from:
> > 
> >    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes.git for-linus
> > 
> > a scheduler fix, a (long-standing) seqlock fix and a softlockup+nohz 
> > fix.
> 
> Correction, the softlock+nohz patch here doesn't actually fix the
> reported regression.  It fixes some other theoretical bug you
> discovered while trying to fix the regression reports.


> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d358d4e..b854a89 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -393,6 +393,7 @@ void tick_nohz_restart_sched_tick(void)
>                 sub_preempt_count(HARDIRQ_OFFSET);
>         }
>  
> +       touch_softlockup_watchdog();
>         /*
>          * Cancel the scheduled timer and restore the tick
>          */


That actually does solve the problem (I tested, as have you:
http://lkml.org/lkml/2008/4/22/98).

However, in that email referenced above you rightly point out that its
likely working around a bug rather than fixing it. I agree.

So it used to work; and these two commits:


commit 15934a37324f32e0fda633dc7984a671ea81cd75
Author: Guillaume Chazarain <guichaz@...oo.fr>
Date:   Sat Apr 19 19:44:57 2008 +0200

    sched: fix rq->clock overflows detection with CONFIG_NO_HZ

    When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC.
    We check that the number of skipped ticks matches the clock jump seen in
    __update_rq_clock().

    Signed-off-by: Guillaume Chazarain <guichaz@...oo.fr>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>


commit 27ec4407790d075c325e1f4da0a19c56953cce23
Author: Ingo Molnar <mingo@...e.hu>
Date:   Thu Feb 28 21:00:21 2008 +0100

    sched: make cpu_clock() globally synchronous

    Alexey Zaytsev reported (and bisected) that the introduction of
    cpu_clock() in printk made the timestamps jump back and forth.

    Make cpu_clock() more reliable while still keeping it fast when it's
    called frequently.

    Signed-off-by: Ingo Molnar <mingo@...e.hu>


break the thing - that is, they generate false softlockup msgs.

Reverting them does indeed make them go away.

You've said (http://lkml.org/lkml/2008/4/24/77 and in testing on your
machine I have indeed found that so) that sparc64's sched_clock() is
rock solid.

So what I've done is (-linus without the above two commits):

unsigned long long cpu_clock(int cpu)
{
        unsigned long long now;
        unsigned long flags;
        struct rq *rq;

        /*
         * Only call sched_clock() if the scheduler has already been
         * initialized (some code might call cpu_clock() very early):
         */
        if (unlikely(!scheduler_running))
                return 0;

#if 0
        local_irq_save(flags);
        rq = cpu_rq(cpu);
        update_rq_clock(rq);
        now = rq->clock;
        local_irq_restore(flags);
#else
        now = sched_clock();
#endif

        return now;
}

This cuts out all the rq->clock logic and should give a stable time on
your machine.

Lo and Behold, the softlockups are back!


Now things get a little hazy:

 a) 15934a37324f32e0fda633dc7984a671ea81cd75 does indeed fix a bug in
    rq->clock; without that patch it compresses nohz time to a single
    jiffie, so cpu_clock() which (without the above hack) is based on
    rq->clock will be short on nohz time. This can 'hide' the clock jump
    and thus hide false positives.


 b) there is commit:

---
commit d3938204468dccae16be0099a2abf53db4ed0505
Author: Thomas Gleixner <tglx@...utronix.de>
Date:   Wed Nov 28 15:52:56 2007 +0100
    softlockup: fix false positives on CONFIG_NOHZ

    David Miller reported soft lockup false-positives that trigger
    on NOHZ due to CPUs idling for more than 10 seconds.

    The solution is touch the softlockup watchdog when we return from
    idle. (by definition we are not 'locked up' when we were idle)

     http://bugzilla.kernel.org/show_bug.cgi?id=9409

    Reported-by: David Miller <davem@...emloft.net>
    Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 27a2338..cb89fa8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -133,6 +133,8 @@ void tick_nohz_update_jiffies(void)
 	if (!ts->tick_stopped)
 		return;
 
+	touch_softlockup_watchdog();
+
 	cpu_clear(cpu, nohz_cpu_mask);
 	now = ktime_get();

---

    which should 'fix' this problem.


 c) there are 'IPI' handlers on SPARC64 that look like they can wake
    the CPU from idle sleep but do not appear to call irq_enter() which
    has the above patch's touch_softlock_watchdog() in its callchain.

tl0_irq1:       TRAP_IRQ(smp_call_function_client, 1)
tl0_irq2:       TRAP_IRQ(smp_receive_signal_client, 2)
tl0_irq3:       TRAP_IRQ(smp_penguin_jailcell, 3)
tl0_irq4:       TRAP_IRQ(smp_new_mmu_context_version_client, 4)



So the current working thesis is that the bug in a) hides a real problem
not quite fixed by b) and exploited by c).

When I tried to build a state machine to validate this thesis the kernel
blew up on me, so I guess I need to go get my morning juice and try
again ;-)

Insight into the above is appreciated as I'm out on a limb on two
fronts: nohz and sparc64 :-)

/me goes onward testing..




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/