[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191007143255.GA59713@gmail.com>
Date: Mon, 7 Oct 2019 16:32:55 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Sean Christopherson <sean.j.christopherson@...el.com>
Cc: Dave Hansen <dave.hansen@...el.com>,
Changbin Du <changbin.du@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical
* Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> > On 10/4/19 6:45 AM, Changbin Du wrote:
> > > +static inline bool is_canonical_addr(u64 addr)
> > > +{
> > > +#ifdef CONFIG_X86_64
> > > + int shift = 64 - boot_cpu_data.x86_phys_bits;
> >
> > I think you mean to check the virtual bits member, not "phys_bits".
> >
> > BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> > Would one of those work in this case?
> >
> > As for the error message:
> >
> > > {
> > > - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > > + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > > + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> >
> > I've always read that as "the GP might have been caused by a
> > non-canonical access". The main nit I'd have with the change is that I
> > don't think all #GP's during user access functions which are given a
> > non-canonical address *necessarily* caused the #GP.
> >
> > There are a billion ways you can get a #GP and I bet canonical
> > violations aren't the only way you can get one in a user copy function.
>
> All the other reasons would require a fairly egregious kernel bug, hence
> the speculation that the #GP is due to a non-canonical address. Something
> like the following would be more precise, though highly unlikely to ever
> be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> error code that went unnoticed for years.
>
> WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> "Segmentation bug");
Instead of trying to guess the reason of the #GPF (which guess might be
wrong), please just state it as the reason if we are sure that the cause
is a non-canonical address - and provide a best-guess if it's not but
clearly signal that it's a guess.
I.e. if I understood all the cases correctly we'd have three types of
messages generated:
!error_code:
"General protection fault in user access, due to non-canonical address."
error_code && !is_canonical_addr(fault_addr):
"General protection fault in user access. Non-canonical address?"
error_code && is_canonical_addr(fault_addr):
"General protection fault in user access. Segmentation bug?"
Only the first one is declarative, because we know we got a #GP with a
zero error code which should denote a non-canonical address access.
The second and third ones are guesses with question marks to communicate
the uncertainty.
Assuming that !error_code always means non-canonical access?
And hopefully "!error_code && !is_canonical_addr(fault_addr)" is not
possible?
Thanks,
Ingo
Powered by blists - more mailing lists