linux-kernel - Re: [PATCH -tip 1/3] x86, mce: Add mce

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49D0996D.1050106@linux.intel.com>
Date:	Mon, 30 Mar 2009 12:05:33 +0200
From:	Andi Kleen <ak@...ux.intel.com>
To:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
CC:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intel
 cmci


> 
>> BTW another thing you need to be aware of is that not all CMCI banks necessarily support
>> thresholds > 1. The SDM has a special algorithm to discover the counter width.
>> This means the scheme wouldn't work for some banks.
> 
> My current implementation already follows the SDM.

Yes didn't want to doubt that, just saying that it's not very useful
to play with the thresholds on those "only one" banks.

> I should have document that "if the maximum threshold the bank supports
> is lower than the specified, the maximum is used."
> 
>>> I already have an another patch to have sysfs interface.
>> Oh no, please no sysfs interface. I know the AMD code has that, but imho it's just
>> a lot of (surprisingly tricky) code for very little to no gain. The surprisingly
>> tricky is because handling all the CPU hotplug cases correctly is not trivial.
> 
> Do you say no even if it is not per-bank?

It'll be messy even without per bank. sysfs doesn't have a framework
for per cpu values, so everything has to be reimplemented in everyone.
e.g. one issue is also shared banks: you have to pass ownership
to another CPU when someone offlines a sibling. It's quite messy.

> I'd like to have only one file that controls global value for all banks.
> It is rather simple and easy to use for users (not for intelligent backend).

My main problem is that there is imho no useful use case for it.
And adding code for something that has no use case seems .... wasteful.
I typically don't object if it's only a few lines, but if it's complicated
then I do.

> Such staggered disablements is not what comes first.
> As Ingo pointed, I think "CMCI is a new CPU feature so having boot controls
> to disable it is generally a good idea" + "and it might be handy if the hw
> is misbehaving."

Ok, that would argue for a boot parameter.

I'm not 100% sure of its wisdom because I know you'll get some misbehavior
on Nehalem if you turn it off because of shared banks.


> Summarize:
>  - Disabling CMCI (=use polling instead) is nice to have.

with a boot parameter.

>  - Disabling polling (but use CMCI) is pointless.
>     (only use on trouble that only break polling?)

You can already do that by setting check_interval == 0


>  - Disabling stuff for CE (both of polling and CMCI) will be help for some
>    particular cases.

Actually I have my doubts of that (if you think of the SMI logging
which should be able to get them first anyways without kernel options),
but a boot option for this at least wouldn't be particularly
bloated. I suspect the use case would be to mainly shut off
the printk.

>  - Increasing threshold is not so good idea?

Yes.

> 
> Personally, instead of "mce=nopoll" and "mce_threshold=[0|N]", an alternative
> combination, one like "mce=no_corrected" or "mce=ignore_ce" for disable both
> and another like "mce=no_cmci" for disabling CMCI, would be also OK.
> Which do you prefer?

mce=ignore_ce and mce=no_cmci

However I think you can do the ignore_ce part in your BIOS
too if you want (SMI code could as well clean those after
logging)

And for no_cmci see the caveats above.

Also it's still open if you want to do the logging of left over
errors from boot too or not included with this.

How about UC errors that are left over from the last panic?
If you want to disable those too (I suspect you might because
your BIOS probably logged them) then the ignore_ce option would
be misnamed because it would need to apply to UC too.

> IIRC, the complain was from user of IPF, because it was "noise" for him.
> Or just there was "it would be acceptable if the rate were 1/5" or so.
> Real solution will be killing CE related stuff in kernel at all, anyway.

Or in the BIOS. We can do it in the kernel, but I suspect for you
it would be user friendlier if the BIOS just never made them
visible.

> I should have ask this first:
>   Are there any reference value for CMCI threshold?

Not that I know of.

I actually asked and I was told the CMCI threshold was more for avoiding
(very) theoretical storms than to do real error thresholding.

> 
> In short, it changes behavior on uncorrected errors, from "panic" to "hang up."

Playing devils advocate here, but if your BIOS is really that intelligent
isn't that what you want?  As far as I understand your patches seem
to be all about moving things from the OS to the BIOS and that
would be the ultimate way to move UC errors to the BIOS too.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/