lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D949E75.90701@redhat.com>
Date:	Thu, 31 Mar 2011 11:32:05 -0400
From:	Prarit Bhargava <prarit@...hat.com>
To:	Steven Rostedt <rostedt@...dmis.org>
CC:	linux-kernel@...r.kernel.org, dzickus@...hat.com
Subject: Re: [PATCH]: Use cmpxchg() in WARN_*_ONCE() functions

Hey Steve,

On 03/31/2011 11:23 AM, Steven Rostedt wrote:
> On Thu, Mar 31, 2011 at 08:46:07AM -0400, Prarit Bhargava wrote:
>   
>> An issue popped up where WARN_ON_ONCE() was used in a callback function
>> in smp_call_function().  This resulted in the WARN_ON executing multiple times
>> when it should have only executed once.
>>     
> But that is just once per cpu, correct?
>   

Not always.  Sometimes I see a subset of CPUs ... maybe a cacheflush or
something hits that finally causes the remaining cpus to see __warned? 
I dunno...

But I have had 24/24 cpus output the message.

>   
>> I then did
>>
>>         for (i = 0; i < 1000000; i++)
>>                 on_each_cpu(prarit_callback, NULL, 0);
>>
>> The current code, of course, explodes :).  That's the bug I'm trying to fix.
>>     
> How exactly does it explode? How many CPUs do you have, and does this
> still just print once per CPU?
>
>   

It explodes because each cpu spits out a warning (which is the issue I'm
trying to resolve).

24 physical cores is what I tested on, but this has been seen in the
field on a system with 6 on RHEL6 (2.6.32/33/34/35/36/37/38-ish).

>> What is interesting in this test, however, is the impact that checking the
>> !__warned flag has [Aside: Checking the !__warned flag is an enhancement
>> and is not explicitly required for this code].
>>
>> A run with just (!cmpxchg(&__warned, 0, 1)) results in an average of 21.323s,
>> and a run with  (!__warned && !cmpxchg(&__warned, 0, 1)) results in an
>> average of 20.233s.  Of course, the !__warned is not necessary for the code
>> to work properly but it seems to be a significant impact to the time to run
>> this code.
>>     
> Yes adding the check for !__warned first should have obvious benefits.
>
> I really do not see anything wrong with this patch, but personally, I
> would rather fix what caused the WARN_ON_ONCE() than fix the warning
> itself, as long as the warning itself does not really break anything
> else.
>   


The WARN_ON_ONCE was triggering due to bad HW setup.  The system in
question had the APERFMPERF flag only set on the boot cpu and no other
cpus.  This caused the system to generate warnings in the acpi cpufreq code.

The HW issue was resolved by modifying a BIOS setting which was found to
clear the APERFMPERF cpu flag setting on the !boot cpus.  Yes, this
means the HW is busted.

But ... that still leaves the possibility that WARN_ON_ONCE spits out
many warnings instead of just one.  Hence, the patch.

P.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ