[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141102121139.GA7000@polanet.pl>
Date: Sun, 2 Nov 2014 13:11:39 +0100
From: Tomasz Pala <gotar@...anet.pl>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] amd64_edac: Build module on x86-32
On Sun, Nov 02, 2014 at 11:33:00 +0100, Borislav Petkov wrote:
> Not enabling it on 32-bit was a conscious decision for the simple reason
> that with the current DIMM sizes, you can have 1 or 2 DIMMs tops which
> you can use on 32-bit and having a fat driver mapping memory errors to
> DIMMs in that case does seem like a waste of time, energy, resources...
> you name it.
In my case it's not about mapping but Detection.
=== begin story ===
Recently my PostgreSQL db failed with:
invalid page header in block 240 of relation base/49095/161613
which was fortunately 'fixed' by:
echo 1 > /proc/sys/vm/drop_caches
It turned out that there were on-disk differences between RAID1 (md)
components, not only shown by next run of mdadm-checkarray, but also
visible in actual filesystem after splitting RAID1 into separate
volumes. There were no problems registered in S.M.A.R.T. logs, but
_somehow_ my data got corrupted and I got not a single diagnostic tool
available. There were no power outages or any other abrupt events, it
just happened, without any reason. I've found some page cache corruption
reports on the net, but none of those matched my conditions.
Currently I'm using checksums at application level (available since
PostgreSQL 9.3) and FS level (BTRFS) and EDAC for 4x1 GB ECC UDIMM
(I did replace 2x2 GB non-ECC with these).
If I could I'd use block-level checksumming or setup RAID1 to
scrub-on-read mode, as this system has very low usage volume and I don't
care about performance at all. Unfortunately SATA T13 didn't made it to
the market, and SCSI drives with DIF/DIX are overkill for this system.
=== end story ===
There is absolutely no reason for you to forbid me using EDAC.
And your reasoning is flawn because:
1. I got 4 pieces of 1 GB ECC UDIMM, not 1 or 2 as you stated; isn't it
supported config? or maybe you would like to replace my modules with
4 GB one free of charge (shipping included)?
2. It is my time, energy and resources, it's not up to you to decide how
I'm going to waste them. What next, removing support for power-hungry
CPUs? Anyway Poland recently got free CO2 emmisions in EU commision;)
3. If _that_ was the reason, why didn't you made it straightforward
by depending on HIGHMEM64G? Because such ridiculous condition would
be soon removed?
4. You could apply the same logic to all the other EDAC modules - next
to the system mentioned above I got some real server boards, some of
them running 32-bit kernel with ECC FBDIMMs and first from the top:
config EDAC_I5000
depends on EDAC_MM_EDAC && X86 && PCI
Actually there are only 2 X86_64 dependencies in drivers/edac/Kconfig:
EDAC_SBRIDGE and EDAC_AMD64 - would you 'fix' every X86 as pointless?
5. If you want to prevent this module from loading when only 1 or 2
DIMMs are installed, just wire this into the module; I got 4 modules.
Anyway, even with just 1 module installed, I'd like to know error
rates to be aware of memory module/controller quality and replace it
when failing too often.
> I guess I'll add a note about this in the Kconfig text because I keep
> getting patches about this once every couple of months :-)
...so, didn't you think that maybe someone needs this?!
Once again: the circuits are working, there is no technical reason not
to use them. It's up to the owner to decide whether it makes sense.
regards,
--
Tomasz Pala <gotar@...-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists