linux-kernel - Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core (with dumps)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4B17EA30.5030208@gmail.com>
Date:	Thu, 03 Dec 2009 14:41:20 -0200
From:	Bruno Barberi Gnecco <brunobg@...il.com>
To:	Mike Galbraith <efault@....de>
CC:	Robert Hancock <hancockrwd@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core
 (with dumps)


>>> Regarding the PS, I have checked voltages with a multimeter and they are
>>> more than fine, and the wattage is enough for the system, so it'd have
>>> to be a very weird transient glitch that affects only memory access. See
>>> also below.
>> Most of the time transients will be the issue when a power supply causes 
>> problems and that can't be seen with a normal voltmeter. It's not 
>> typical for the rails to be low all the time unless the power supply is 
>> heavily overloaded.
> 
> Or stone cold dead.
> 
> You can't check any PSU with any multimeter I've ever seen unless it's a
> catastrophic failure, or as you said, so overloaded that it can't
> regulate (in which case it would have shut down if it were decent
> quality...).  Non-catastophic PSU failures are often filter problems
> that a multimeter isn't fast enough to see.  Many switchers are
> deplorably noisy, and rely on the caps at the end of the transmission
> line, so one poor quality or dried out cap on MB can screw the pooch
> too.
> 
>>> Any ideas to rule the MB out, other than "get a new one"?
>>>
>>>> Bad memory (memtest doesn't necessarily access things the same way as
>>>> the kernel)
>>> Ruled out. I replaced with a 2GB DDR2, still got the bug: "BUG: Bad page
>>> map in process".
>>>
>>>> Bad cards (pci, agp, whatever)
>>> Ruled out. The only card is the video card. I replaced it with a very
>>> old PCI board and still got error. This also pretty much rules out that
>>> the PS is underpowered, since I powered only the MB and the HD.
>>>
>>> Could it be one of the onboard things? I disabled everything but the
>>> LAN, and still got it.
>>>
>>>> Any of the above with loose connections
> 
> Pay very close attention to cleanliness.  Dust works it's way into
> connectors with vibration.  Pull ram, and reseat.  Resist the urge to
> clean any connector with anything other than no-residue contact cleaner.
> 
> Another thing to watch out for is crappy heat sink compound.  That dries
> out, doesn't conduct heat well enough.  Under load, such a problem may
> build VERY fast with modern CPU current draw.  If all else fails, pull
> your CPU heatsink, clean and re-apply fresh compound.
> 
>>> I already reconnected everything twice. Could still be a loose
>>> connection of one of the wires in the connector, but it's very very
>>> unlikely to give such a specific error on memory access.
>>>
>>>> And did I mention bad power supply?
>>> Yes you did, and I'll try to get another one to be sure, but it could
>>> still be a software bug too.
> 
> Yes, but try another unit.  PSU is THE odds on favorite for random crap
> with everything from PC hardware to very high dollar HW.  It's the point
> of maximum electrical stress.  It's also a spot where many people try to
> save money... big mistake that.
> 
> (removes HW guy hat;)

	Follow-up, with thanks to everybody who helped: I tried a different PSU and still got the 
problem, and I also got a BSOD with Windows. So it seems to be a problem with the 
motherboard or the processor.

	Thanks a lot again,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/