[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120327191112.GA11587@aftab>
Date: Tue, 27 Mar 2012 21:11:12 +0200
From: Borislav Petkov <bp@...64.org>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...64.org>,
Mauro Carvalho Chehab <mchehab@...hat.com>,
Ingo Molnar <mingo@...e.hu>,
EDAC devel <linux-edac@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk
buffer
On Tue, Mar 27, 2012 at 06:35:37PM +0000, Luck, Tony wrote:
> > In any case, if during the safe period of time we haven't received
> > confirmation from userspace that the item has been consumed, we switch
> > irreversibly back to the kernel log buffer and reissue the decoded info
> > through printk.
>
> I'm not sure I like irreversible things.
>
> Here's the life cycle:
>
> 1) System boots ... we have a window during this time where there is
> no daemon (or any user space at all).
>
> 2) Daemon gets started from /etc/init.d or systemd script
>
> 3) (optional) New version of daemon installed in update (old daemon is terminated, new one starts).
>
> 4) System is shutdown - all daemons terminated
>
> 5) System actually halts.
>
>
> So we clearly have some gaps where there isn't a daemon. Most of them should
> be pretty short ... but I worry about the gap from #1 to #2 - which can be pretty
> long if we need to fsck some disks (or we on some crazy big system that takes
> many minutes just to find and spin-up all the disks).
Well, currenty we queue MCEs for later consumption before the decoder
chains have been registered etc: 0937195715713. We probably could delay
the draining of the buffer until we have userspace and daemon running.
Problems with this is that buffer size is limited: 32 struct mce's and
it can overflow pretty fast on a b0rked system which spews a lot of MCEs
during boot.
We probably could provide for enlarging that when needed as a Kconfig or
a boot option using early memblock allocations or whatever...
Then, after maybe a configurable period of uptime (it should be chosen
to be safe for most systems out there and the others could configure in
a higher timeout if they need to) we start spewing out decoded MCEs into
dmesg unless a daemon has drained the buffers before that.
Or something to that effect...
Concerning the irreversibility, we could probably teach the code to stop
printk'ing MCEs if the daemon has been restarted in the meantime...
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists