[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120423150657.GA24481@phenom.dumpdata.com>
Date: Mon, 23 Apr 2012 11:06:57 -0400
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: Borislav Petkov <bp@...64.org>
Cc: "Liu, Jinsong" <jinsong.liu@...el.com>, tony.luck@...el.com,
x86@...nel.org, linux-edac@...r.kernel.org,
"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Xen-devel] [PATCH 1/3] Add mcelog support for xen platform
> > This driver is not that much different from the APEI bridge to MCE code -
> > it just that instead of reading APEI blob data it reads it from an hypercall.
>
> Let me ask you this: is APEI a virtualization solution of some sort?
>
> No, it is the old windoze RAS crap but I guess Linux has to support it
> now too through BIOS. And x86 vendors will have to support it too.
>
> So it is piece of the firmware we'd have to deal with too.
>
> Now xen is a whole another deal - it is purely a piece of software.
Perfect. Software is more elastic than hardware - so the Xen ABI
for the MCE can be changed then to reflect the new format if required.
>
> > The fix seems quite easy - you change the 'struct mce' and 'mce_log()'
> > along with the drivers that use it.
>
> This is exactly what I have a problem with: having to take care of xen
> too. "No, Boris, nope, we cannot take your new feature because it breaks
> xen." and also "Have you tested this on xen too?" where the only thing I
> do is _hardware_ enablement and improving software support for it. And
> xen is not hardware...
Delegate testing to sub-maintainers. In this case that would be me
and Liu.
>
> [..]
>
> > If you are worried about breaking something, then you can just send
> > the change to me or Liu to test it out before committing API changes
> > in the MCE code.
>
> This probably sounds good now but I don't think code changes like
> that ever run as smoothly. Whenever there's breakage, there'll always
> be people screaming against it - I just don't want code that enables
Right, regressions are bad.
> hardware to be crippled and unable to change because it breaks
> completely unrelated pieces - it is bad as it is now.
Can you point me to the existing examples of MCE's badness?
I remember the Greg KK's patches to the dynamic vs static and per-cpu
initialization - but that wasn't due to the MCE API. I think that
was due to Key Sievens transition from SysFS subsystem API to device API
patches that broke bunch of stuff.
>
> > > And this has happened already with the whole microcode loading debacle.
> >
> > My recollection is that the existing microcode API had major issues that
> > could not fixed. The only fix was to make it be very early in the bootup
> > processes and that is what hpa would like developers to focus on.
>
> That was one side of the problem. The other was, AFAICR, creating a xen
> microcode driver which was "on the same level" as the hardware microcode
> drivers, which was completely bull*.
I think of Xen as the hypervisors on PowerPC boxes - for certain operations
you have to use hypercalls to do some hardware operations.
>
> The problem is xen growing stuff everywhere in arch/x86/ and this way,
> maybe even unwillingly, crippling development of hardware-related
> features. I know you're willing to help and I know you mean it well, but
> there's always some other problem in practice.
I am not sure I see why we cannot fix the practical problems as they pop
up?
>
> Now I keep wondering, why don't you guys simply create your own mcelog
> ring buffer and interface on the userspace tool side instead of hooking
> into lowlevel kernel stuff? I mean, the code is there, you simply have
> to copy it into arch/xen/ or whatever you have there. Why do you have to
Nowadays the kernel can transition to run under lguest, KVM, Xen or
baremetal as a single binary image instead of multiple compiled
kernels for a specific virtualization framework. As such, there is
no 'arch/xen,lguest,kvm', instead there is alternative_asm that patches
the low-level calls (set_pte, load_cr3, spinlock, time, etc), for the
appropiate virtualization (or CPU if done under baremetal) offering. Hence
the arch/x86 has expanded to support baremtal and virtualization
extensions in it (called paravirt_ops).
> hook into arch/x86/ instead of doing your own stuff?
I think what you are suggesting is to _not_ reuse existing APIs. That
seems counter-intuive to general software development. There are
exceptions of course - when the existing API needs to change a lot
(or needs to be thrown out), and there is this one little driver that
keeps on using the old interface and can't change - at that point I can
see the purpose of forking it. But until then - using existing APIs is
the way to go.
And I (along with Liu) will keep the Xen MCE driver evolving as it
needs to conform to the new kernel mcheck API.
>
> > > So, my suggestion is to copy the pieces you need and create your own xen
> > > version of /dev/mcelog and add it to dom0 so that there's no hooking
> > > into baremetal code and whenever a dom0 kernel is running, you can
> > > reroute the mcelog userspace tool to read /dev/xen_mcelog or whatever
> > > and not hook into the x86 versions.
> > >
> > > Because, if you'd hooked into it, just imagine one fine day, when we
> > > remove mcelog support, what screaming the xen people will be doing when
> > > mcelog doesn't work anymore.
> >
> > You would have more screaming from the distro camp about removing
> > /dev/mcelog.
>
> How do you know that? Don't you think that we probably would've talked
> to them already and made preparations for conversion first?
I was Googling around for it and I couldn't find anything that says
MCE is removed (which could be very well the fault of my poor
Googling-skills) and its replacement user-space program. Please do
point me to the URL so I can get some idea of what is brewing.
The one thing I saw was this https://lkml.org/lkml/2012/3/2/312 which
pointed to the /dev/mcelog struct changing (which then got NACK-ed), but
nothing about the internal 'struct mce' being dropped from drivers?
I couldn't find anything in the Documentation/feature-removal-schedule.txt
There are hints of ras_printk and /sys/devices/system/ras/agent but
they are related to printk (and I see that mce_log would use it too), but
that just seems to [from a driver perspective] to be the output code-paths
[like the MCE decoders] - and it allows to output to be put in a trace
buffer as well - instead of just in /dev/mcelog.
If the distros choose to stop using /dev/mcelog and use another mechanism
(and the MCE check drivers still do their job of collecting the data and
sending it downstream), then I don't see why "screaming the xen people will
be doing" - as they still get the MCE errors - in whatever way the distros
has choosen to present it.
>
> > But if that is your choice, I would send you an email asking how I
> > need to retool this driver to work with the new MCE gen2 code that you
> > had in mind.
>
> As I said above, I'm very sceptical this will ever work, I guess I'd
> have to live and see.
>
> Now, with your own buffer solution, nothing breaks and all is happy,
> a win-win, if you wish. I think this is much simpler and easier a
> solution.
Not sure what you mean by 'own buffer solution'. Are you talking about
using the trace_mce_record or the ras_printk instead of the mce_log?
I would think that is the job of the MCE decoders?
Please keep in mind that this driver is not trying to decode anything -
it just lifting raw events, massaging them a bit, and sending them
downstream. Similar to how the APEI does it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists