[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4B17EA30.5030208@gmail.com>
Date: Thu, 03 Dec 2009 14:41:20 -0200
From: Bruno Barberi Gnecco <brunobg@...il.com>
To: Mike Galbraith <efault@....de>
CC: Robert Hancock <hancockrwd@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: BUG: Constant freezes and kernel panics on a quad core
(with dumps)
>>> Regarding the PS, I have checked voltages with a multimeter and they are
>>> more than fine, and the wattage is enough for the system, so it'd have
>>> to be a very weird transient glitch that affects only memory access. See
>>> also below.
>> Most of the time transients will be the issue when a power supply causes
>> problems and that can't be seen with a normal voltmeter. It's not
>> typical for the rails to be low all the time unless the power supply is
>> heavily overloaded.
>
> Or stone cold dead.
>
> You can't check any PSU with any multimeter I've ever seen unless it's a
> catastrophic failure, or as you said, so overloaded that it can't
> regulate (in which case it would have shut down if it were decent
> quality...). Non-catastophic PSU failures are often filter problems
> that a multimeter isn't fast enough to see. Many switchers are
> deplorably noisy, and rely on the caps at the end of the transmission
> line, so one poor quality or dried out cap on MB can screw the pooch
> too.
>
>>> Any ideas to rule the MB out, other than "get a new one"?
>>>
>>>> Bad memory (memtest doesn't necessarily access things the same way as
>>>> the kernel)
>>> Ruled out. I replaced with a 2GB DDR2, still got the bug: "BUG: Bad page
>>> map in process".
>>>
>>>> Bad cards (pci, agp, whatever)
>>> Ruled out. The only card is the video card. I replaced it with a very
>>> old PCI board and still got error. This also pretty much rules out that
>>> the PS is underpowered, since I powered only the MB and the HD.
>>>
>>> Could it be one of the onboard things? I disabled everything but the
>>> LAN, and still got it.
>>>
>>>> Any of the above with loose connections
>
> Pay very close attention to cleanliness. Dust works it's way into
> connectors with vibration. Pull ram, and reseat. Resist the urge to
> clean any connector with anything other than no-residue contact cleaner.
>
> Another thing to watch out for is crappy heat sink compound. That dries
> out, doesn't conduct heat well enough. Under load, such a problem may
> build VERY fast with modern CPU current draw. If all else fails, pull
> your CPU heatsink, clean and re-apply fresh compound.
>
>>> I already reconnected everything twice. Could still be a loose
>>> connection of one of the wires in the connector, but it's very very
>>> unlikely to give such a specific error on memory access.
>>>
>>>> And did I mention bad power supply?
>>> Yes you did, and I'll try to get another one to be sure, but it could
>>> still be a software bug too.
>
> Yes, but try another unit. PSU is THE odds on favorite for random crap
> with everything from PC hardware to very high dollar HW. It's the point
> of maximum electrical stress. It's also a spot where many people try to
> save money... big mistake that.
>
> (removes HW guy hat;)
Follow-up, with thanks to everybody who helped: I tried a different PSU and still got the
problem, and I also got a BSOD with Windows. So it seems to be a problem with the
motherboard or the processor.
Thanks a lot again,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists