[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <49F95564.5020402@jp.fujitsu.com>
Date: Thu, 30 Apr 2009 16:38:12 +0900
From: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To: Andi Kleen <ak@...ux.intel.com>
CC: "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...e.hu>,
Huang Ying <ying.huang@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org
Subject: 32bit mce unification (Re: Re-implement MCE log ring buffer as per-CPU
ring buffer II)
Hi Andi,
Andi Kleen wrote:
> H. Peter Anvin wrote:
>> Andi Kleen wrote:
>>> And then restart here? Add 32bit workarounds, add hook
>>> to initialize p5/winchip, make 64bit code 32bit clean.
>>> That can come straight from my patchkit. I can do that
>>> if there is interest.
>>>
>>
>> Could you come up with a tree we can look at?
>
> Sorry for taking longer, had to fight some infrastructure issues.
>
> I merged the two trees in
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git
> mce-32bit-merge
>
> This is my old 32bit mce unification tree forward ported to mce2 and
> some fixes
> for the code in mce2. This is currently with Kconfig for old/new,
> deprecation
> of old for .32 (could be done directly too, but doing it with this
> additional
> step seems safer to me because it will allow easier debugging)
>
> This currently includes the two error injection patches as last (used
> for testing), those can be dropped/added later of course too.
>
> If this is ok I can respin the rest of the mce patches on top of that
> tree.
Thank you for providing updates.
They are looks good, but I have some comments:
We can get an instant numbered list from your tree by following:
$ git format-patch origin/master..origin/mce-32bit-merge
0001-x86-mce-Add-mce_threshold-option-for-intel-cmci.patch
0002-x86-mce-Cleanup-param-parser.patch
0003-x86-mce-Add-mce-nopoll-option-to-disable-timer-pol.patch
0004-x86-mce-clean-up-the-mce_64.c-code.patch
0005-x86-mce-clean-up-the-sysfs-variables-namespace.patch
0006-x86-mce-clean-up-mce_amd_64.c.patch
0007-x86-mce-clean-up-p4.c.patch
0008-x86-mce-clean-up-therm_throt.c.patch
0009-x86-mce-clean-up-p6.c.patch
0010-x86-mce-clean-up-k7.c.patch
0011-x86-mce-clean-up-non-fatal.c.patch
0012-x86-mce-clean-up-mce_32.c.patch
0013-x86-mce-clean-up-p5.c.patch
0014-x86-mce-clean-up-winchip.c.patch
0015-x86-mce-clean-up-mce.h.patch
0016-mce-unify-Intel-thermal-init-prepare.patch
0017-x86-mce-unify-the-Intel-thermal-interrupt-code.patch
0018-x86-mce-clean-up-mce_intel.c.patch
0019-x86-mce-clean-up-mce_64.c.patch
0020-x86-mce-prepare-mce.h-and-mce_64.c-for-unification.patch
0021-x86-mce-prepare-unification.patch
0022-x86-mce-unify-prepare-for-32-bit.patch
0023-x86-mce-unify.patch
0024-x86-mce-print-number-of-MCE-banks.patch
0025-x86-MCE-Make-mce_amd_64.c-compile-again.patch
0026-Revert-x86-mce-Add-mce-nopoll-option-to-disable-t.patch
0027-x86-MCE-Move-ifdef-X86_64-to-the-beginning-of-the.patch
0028-x86-MCE-Initial-steps-to-make-64bit-mce-code-32bit.patch
0029-x86-MCE-Implement-the-PPro-bank-0-quirk-in-the-64b.patch
0030-x86-MCE-Port-K7-bank-0-quirk-to-64bit-mce-code.patch
0031-x86-MCE-Use-a-call-vector-to-call-the-64bit-mce-ha.patch
0032-x86-MCE-Rename-64bit-mce_dont_init-to-mce_disabled.patch
0033-x86-MCE-Move-mce_disabled-option-into-common-64bit.patch
0034-x86-MCE-Remove-machine-check-handler-idle-notify-o.patch
0035-x86-MCE-Remove-oops_begin-use-in-64bit-machine-c.patch
0036-x86-MCE-Remove-unused-stop-restart_mce-on-32bit.patch
0037-x86-MCE-Use-64bit-machine-check-code-on-32bit.patch
0038-x86-MCE-Deprecate-old-32bit-machine-check-code.patch
0039-x86-MCE-Enable-MCE_INTEL-for-32bit-new-MCE.patch
0040-x86-MCE-Enable-MCE_AMD-for-32bit-NEW_MCE.patch
0041-x86-MCE-Document-new-32bit-mcelog-requirement-in-D.patch
0042-x86-MCE-Add-MSR-read-wrappers-for-easier-error-inj.patch
0043-x86-MCE-Add-basic-error-injection-infrastructure.patch
Let's start comment/review on this list:
0001-x86-mce-Add-mce_threshold-option-for-intel-cmci.patch
0002-x86-mce-Cleanup-param-parser.patch
0003-x86-mce-Add-mce-nopoll-option-to-disable-timer-pol.patch
These 3 above are merged in tip/x86/mce2, but not pushed into
mainline yet.
You(Andi) and I agreed that 0001 and 0003 should be reverted and
re-implemented before pushing them into .31, and I have patches
to do it.
One of revert patches is placed in 0026. It's fine but I'd like to
replace it with that I own, which have more detailed description.
0004-x86-mce-clean-up-the-mce_64.c-code.patch
0005-x86-mce-clean-up-the-sysfs-variables-namespace.patch
0006-x86-mce-clean-up-mce_amd_64.c.patch
0007-x86-mce-clean-up-p4.c.patch
0008-x86-mce-clean-up-therm_throt.c.patch
0009-x86-mce-clean-up-p6.c.patch
0010-x86-mce-clean-up-k7.c.patch
0011-x86-mce-clean-up-non-fatal.c.patch
0012-x86-mce-clean-up-mce_32.c.patch
0013-x86-mce-clean-up-p5.c.patch
0014-x86-mce-clean-up-winchip.c.patch
0015-x86-mce-clean-up-mce.h.patch
0016-mce-unify-Intel-thermal-init-prepare.patch
0017-x86-mce-unify-the-Intel-thermal-interrupt-code.patch
0018-x86-mce-clean-up-mce_intel.c.patch
0019-x86-mce-clean-up-mce_64.c.patch
0020-x86-mce-prepare-mce.h-and-mce_64.c-for-unification.patch
0021-x86-mce-prepare-unification.patch
0022-x86-mce-unify-prepare-for-32-bit.patch
0023-x86-mce-unify.patch
0024-x86-mce-print-number-of-MCE-banks.patch
These 21 patches above are now top of tip/x86/mce2.
Some are broken and some following patches from Andi are
addressed to fix it.
0025-x86-MCE-Make-mce_amd_64.c-compile-again.patch
The description of this patch is "It's device_mce, not mce_dev."
This is for a bug of 0005, which was intended to rename device_mce
to mce_dev, but the rename was not finished.
I suppose "mce_" prefix is better for name space, so I think
rework on 0005 or incremental fix to finish renaming will be more
better solution.
0026-Revert-x86-mce-Add-mce-nopoll-option-to-disable-t.patch
This reverts 0003. I'd like to replace this with that I own.
0027-x86-MCE-Move-ifdef-X86_64-to-the-beginning-of-the.patch
This is for a bug of 0022, that places ifdefs wrongly.
0028-x86-MCE-Initial-steps-to-make-64bit-mce-code-32bit.patch
0029-x86-MCE-Implement-the-PPro-bank-0-quirk-in-the-64b.patch
0030-x86-MCE-Port-K7-bank-0-quirk-to-64bit-mce-code.patch
0031-x86-MCE-Use-a-call-vector-to-call-the-64bit-mce-ha.patch
0032-x86-MCE-Rename-64bit-mce_dont_init-to-mce_disabled.patch
0033-x86-MCE-Move-mce_disabled-option-into-common-64bit.patch
0034-x86-MCE-Remove-machine-check-handler-idle-notify-o.patch
0035-x86-MCE-Remove-oops_begin-use-in-64bit-machine-c.patch
0036-x86-MCE-Remove-unused-stop-restart_mce-on-32bit.patch
0037-x86-MCE-Use-64bit-machine-check-code-on-32bit.patch
0038-x86-MCE-Deprecate-old-32bit-machine-check-code.patch
0039-x86-MCE-Enable-MCE_INTEL-for-32bit-new-MCE.patch
0040-x86-MCE-Enable-MCE_AMD-for-32bit-NEW_MCE.patch
0041-x86-MCE-Document-new-32bit-mcelog-requirement-in-D.patch
0042-x86-MCE-Add-MSR-read-wrappers-for-easier-error-inj.patch
0043-x86-MCE-Add-basic-error-injection-infrastructure.patch
These 16 patches above is 32bit mce unification by Andi.
0028~0036 are preparation and cleanup,
0037~0041 are main part of unification, and
0042~0043 are for error injection.
My comments are as followings:
0029,0037,0038,0041:
have space before tab in indent / trailing whitespace.
0038:
The change in the first line is not required, and we need to schedule
feature removal not so soon.
Since the patch description is "Schedule for removal in 2.6.32",
the "When:" should be .32 instead of .31
>@@ -1,4 +1,4 @@
>-The following is a list of files and features that are going to be
>+he following is a list of files and features that are going to be
> removed in the kernel source tree. Every entry should contain what
> exactly is going away, why it is happening, and who is going to be doing
> the work. When the feature is removed from the kernel, it should also
>@@ -428,3 +428,13 @@
> After a reasonable transition period, we will remove the legacy
> fakephp interface.
> Who: Alex Chiang <achiang@...com>
>+
>+----------------------------
>+
>+What: CONFIG_X86_OLD_MCE
>+When: 2.6.31
>+Why: Remove the old legacy 32bit machine check code. This has been
>+ superseded by the newer machine check code from the 64bit port,
>+ but the old version has been kept around for easier testing. Note this
>+ doesn't impact the old P5 and WinChip machine check handlers.
>+Who: Andi Kleen <andi@...stfloor.org>
0039:
The comment is wrongly changed, and it would be better to place vectors
in sorted order.
>@@ -88,12 +88,13 @@
> #define THERMAL_APIC_VECTOR 0xfa
>
> #ifdef CONFIG_X86_32
>-/* 0xf8 - 0xf9 : free */
>+/* 0xf9 : free */
> #else
>-# define THRESHOLD_APIC_VECTOR 0xf9
> # define UV_BAU_MESSAGE 0xf8
> #endif
>
>+#define THRESHOLD_APIC_VECTOR 0xf9
>+
> /* f0-f7 used for spreading out TLB flushes: */
> #define INVALIDATE_TLB_VECTOR_END 0xf7
> #define INVALIDATE_TLB_VECTOR_START 0xf0
0041:
Is this version correct?
The latest (git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git) is:
#define MCELOG_VERSION "0.8pre2"
>@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
> o oprofile 0.9 # oprofiled --version
> o udev 081 # udevinfo -V
> o grub 0.93 # grub --version
>+o mcelog 0.6
>
> Kernel compilation
> ==================
0043:
From checkpatch:
WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc
#174: FILE: arch/x86/kernel/cpu/mcheck/mce-inject.c:96:
+ if (m.cpu >= NR_CPUS || !cpu_online(m.cpu))
Plus, with CONFIG_X86_MCE_INJECT=m, I got:
ERROR: "add_timer_on" [arch/x86/kernel/cpu/mcheck/mce-inject.ko] undefined!
IIRC there was a patch named "Export add_timer_on for modules" for
error injection. (Or how about using bool instead of tristate if there
are no strong reason to make it kernel module?)
>@@ -823,6 +823,14 @@ config X86_MCE_THRESHOLD
> bool
> default y
>
>+config X86_MCE_INJECT
>+ depends on X86_NEW_MCE
>+ tristate "Machine check injector support"
>+ help
>+ Provide support for injecting machine checks for testing purposes.
>+ If you don't know what a machine check is and you don't do kernel
>+ QA it is safe to say n.
>+
> config X86_MCE_NONFATAL
> tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
> depends on X86_OLD_MCE
Thanks,
H.Seto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists