[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65CD3FC07F3BF942ABE211646D72D770356EACA5@IRSMSX110.ger.corp.intel.com>
Date: Wed, 26 Nov 2014 15:43:25 +0000
From: "Berthier, Emmanuel" <emmanuel.berthier@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: "mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
"Jarzmik, Robert" <robert.jarzmik@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] [LBR] Dump LBRs on Oops
> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@...utronix.de]
> Sent: Wednesday, November 26, 2014 3:47 PM
> To: Berthier, Emmanuel
> Cc: mingo@...hat.com; hpa@...or.com; x86@...nel.org; Jarzmik, Robert;
> linux-kernel@...r.kernel.org
> Subject: RE: [PATCH] [LBR] Dump LBRs on Oops
>
> On Wed, 26 Nov 2014, Berthier, Emmanuel wrote:
> > > We add printks not for people who work on the support of unreleased
> > > hardware. They should better know what they are doing. If they can't
> > > figure that out they should not touch the kernel in the first place.
> >
> > LoL
> > I'm part of those people, I've touched the kernel and I've figured out what
> was wrong.
> > And I would like to be helped next year for the next Core: I'm an old
> > man and I need to leave a white stone trail ;-) Could we agree on
> > that one?
>
> We already have a printk in init_intel_pmu() where we tell about the
> 'unidentified cpu', so we better extend that instead of having something
> dependent on a OOPS.
That's right, I have a patch for that also.
> > > > > Aside of that if we want to debug with the LBR then we better
> > > > > freeze that whole thing across a dump and be done with it.
> > > >
> > > > I met that case but did no dig deeply into it...
> > >
> > > Hmm, a corrupted stack might trigger this together with some of the
> > > other debug options enabled. So we really might to put it in front.
> >
> > Didn't catch you. Could you elaborate on that?
>
> Assume a stack corruption, so the stack dumper follows it w/o noticing and
> hits an unmapped page. So that would be an argument to move the LBR print
> out ahead of the stack dump.
Ok, so no need to change anything here?
> > > 1) We want to enable/disable this at boot time.
> > >
> > > In the disabled case we might also stub out the test/jz and replace
> > > it by an unconditional jump, but that needs more thought.
> >
> > I can add a cmdline option to disable it at boot time.
>
> Enable. Should be disabled by default I think.
ok
> > Do you propose to use code instruction patching (same as ftrace) also?
> > Is-it really worth to bypass test/jz as page fault handling is much
> > more than few instructions?
>
> That's why I said: but that needs more thought.
>
> Though OTOH we keep adding stuff there and if we want to enable that LBR
> feature more widely we should think about keeping the overhead low if it is
> disabled.
>
> We can discuss this after we have a agreed on patch for that feature.
Ok
> > > 2) Right now you stop the trace on every exception no matter whether
> > > it comes from user or kernel space.
> > >
> > > Stopping the trace when we handle a user space fault does not make
> > > any sense and inflicts just pointless overhead.
> > >
> > > Aside of that if the fault handler then crashes we do not have the
> > > LBR information because we froze it when entering from user space
> > > in the first place.
> >
> > Agree, but the LBR buffer contains only 8 records: we have to stop it
> > as soon as possible. If we add some test/jump/call before stopping
> > it, relevant info will be flushed out.
>
> Well, you can certainly test that w/o a jump. Hint:
>
> if (enabled && is_kernel)
> goto x;
>
> can be written in ASM with a single branch as well :)
>
> That adds more instructions before the jz, which might in fact make the code
> patching for the disabled case more interesting.
I will keep it small, trust me! ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists