linux-kernel - Re: Soft lockup regression from today's sched.git merge.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 22 Apr 2008 22:42:12 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	mingo@...e.hu
Cc:	linux-kernel@...r.kernel.org, tglx@...utronix.de
Subject: Re: Soft lockup regression from today's sched.git merge.

From: Ingo Molnar <mingo@...e.hu>
Date: Tue, 22 Apr 2008 11:14:56 +0200

> thanks for reporting it. I havent seen this false positive happen in a 
> long time - but then again, PC CPUs are a lot less idle than a 128-CPU 
> Niagara2 :-/ I'm wondering what the best method would be to provoke a 
> CPU to stay idle that long - to make sure this bug is fixed.

I looked more closely at this.

There is no way the patch in question can work properly.

The algorithm is, essentialy "if time - prev_cpu_time is large enough,
call __sync_cpu_clock()" which if fine, except that nothing ever sets
prev_cpu_time.

The code is fatally flawed, once __sync_cpu_clock() calls start
happening, they will happen on every cpu_clock() call.

So like my bisect showed from the get-go, these cpu_clock() changes
have major problems, so it was quite a mind boggling stretch to stick
a touch_softlockup_watchdog() call somewhere to try and fix this
when the guilty change in question didn't touch that area at all.
:-(

Furthermore, this is an extremely expensive way to ensure monotonic
per-rq timestamps.  A global spinlock taken every 100000 ns on every
cpu?!?!  :-/

At least move any implication of "high speed" from the comments above
cpu_clock() if we're going to need something like this.  I have 128
cpus, that's 128 grabs of that spinlock every quantum.  My next system
I'm getting will have 256 cpus.  The expense of your solution
increases linearly with the number of cpus, which doesn't scale.

Anyways, I'll work on the group sched lockup bug next.  As if I have
nothing better to do during the merge window than fix sched tree
regressions :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/