lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 24 Sep 2015 13:22:12 -0700
From:	"Raj, Ashok" <ashok.raj@...el.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	Ashok Raj <ashok.raj@...el.com>
Subject: Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core
 parts

Hi Boris

On Thu, Sep 24, 2015 at 09:22:24PM +0200, Borislav Petkov wrote:
> 
> Ah, we return. But we shouldn't return - we should overwrite. I believe
> we've talked about the policy of overwriting old errors with new ones.
> 

Another reason i had a separate buffer in my earlier patch was to avoid 
calling rcu() functions from the offline CPU. I had an offline discussion 
with Paul McKenney  he said don't do that... 

mce_gen_pool_add()->gen_pool_alloc() which calls rcu_read_lock() and such. 
So it didn't seem approprite. 

Also the function doesn't seem safe to be called in NMI context. Although
MCE is different, for all intentional purposes we should treat both as same
priority. The old style log is simple and tested in those cases.

I like everything you say below... something we could do as our next phase
of improving logging and might need more careful work to build it right.

just like how MC banks have overwrite rules, we can possibly do something
like that if the buffer fills up.

> TBH, I don't think there's a 100%-correct policy to act according to
> when our error logging buffers are full:
> 
> - we can overwrite old errors with new but then this way we might lose
> the one important error record with which it all started.
> 
> - if we don't overwrite, we might fill up with "unimportant" correctable
> error records and miss other, more important ones which happen now
> 
> - ...
> 
> We could try to implement some cheap heuristics which decide what and
> when to overwrite but I'm sceptical it'll be always correct...
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ