lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120327191112.GA11587@aftab>
Date:	Tue, 27 Mar 2012 21:11:12 +0200
From:	Borislav Petkov <bp@...64.org>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Borislav Petkov <bp@...64.org>,
	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk
 buffer

On Tue, Mar 27, 2012 at 06:35:37PM +0000, Luck, Tony wrote:
> > In any case, if during the safe period of time we haven't received
> > confirmation from userspace that the item has been consumed, we switch
> > irreversibly back to the kernel log buffer and reissue the decoded info
> > through printk.
> 
> I'm not sure I like irreversible things.
> 
> Here's the life cycle:
> 
> 1) System boots ... we have a window during this time where there is
>    no daemon (or any user space at all).
> 
> 2) Daemon gets started from /etc/init.d or systemd script
> 
> 3) (optional) New version of daemon installed in update (old daemon is terminated, new one starts).
> 
> 4) System is shutdown - all daemons terminated
> 
> 5) System actually halts.
> 
> 
> So we clearly have some gaps where there isn't a daemon.  Most of them should
> be pretty short ... but I worry about the gap from #1 to #2 - which can be pretty
> long if we need to fsck some disks (or we on some crazy big system that takes
> many minutes just to find and spin-up all the disks).

Well, currenty we queue MCEs for later consumption before the decoder
chains have been registered etc: 0937195715713. We probably could delay
the draining of the buffer until we have userspace and daemon running.

Problems with this is that buffer size is limited: 32 struct mce's and
it can overflow pretty fast on a b0rked system which spews a lot of MCEs
during boot.

We probably could provide for enlarging that when needed as a Kconfig or
a boot option using early memblock allocations or whatever...

Then, after maybe a configurable period of uptime (it should be chosen
to be safe for most systems out there and the others could configure in
a higher timeout if they need to) we start spewing out decoded MCEs into
dmesg unless a daemon has drained the buffers before that.

Or something to that effect...

Concerning the irreversibility, we could probably teach the code to stop
printk'ing MCEs if the daemon has been restarted in the meantime...

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ