[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <24F5DC49-1FB3-42CF-8323-B0B39D936F7F@amacapital.net>
Date: Thu, 17 Dec 2020 07:35:18 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>,
Weiny Ira <ira.weiny@...el.com>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Fenghua Yu <fenghua.yu@...el.com>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
linux-nvdimm <linux-nvdimm@...ts.01.org>,
Linux-MM <linux-mm@...ck.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@...r.kernel.org>,
Dan Williams <dan.j.williams@...el.com>,
Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [PATCH V3.1] entry: Pass irqentry_state_t by reference
> On Dec 17, 2020, at 5:19 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Dec 17, 2020 at 02:07:01PM +0100, Thomas Gleixner wrote:
>>> On Fri, Dec 11 2020 at 14:14, Andy Lutomirski wrote:
>>>> On Mon, Nov 23, 2020 at 10:10 PM <ira.weiny@...el.com> wrote:
>>> After contemplating this for a bit, I think this isn't really the
>>> right approach. It *works*, but we've mostly just created a bit of an
>>> unfortunate situation. Our stack, on a (possibly nested) entry looks
>>> like:
>>>
>>> previous frame (or empty if we came from usermode)
>>> ---
>>> SS
>>> RSP
>>> FLAGS
>>> CS
>>> RIP
>>> rest of pt_regs
>>>
>>> C frame
>>>
>>> irqentry_state_t (maybe -- the compiler is within its rights to play
>>> almost arbitrary games here)
>>>
>>> more C stuff
>>>
>>> So what we've accomplished is having two distinct arch register
>>> regions, one called pt_regs and the other stuck in irqentry_state_t.
>>> This is annoying because it means that, if we want to access this
>>> thing without passing a pointer around or access it at all from outer
>>> frames, we need to do something terrible with the unwinder, and we
>>> don't want to go there.
>>>
>>> So I propose a somewhat different solution: lay out the stack like this.
>>>
>>> SS
>>> RSP
>>> FLAGS
>>> CS
>>> RIP
>>> rest of pt_regs
>>> PKS
>>> ^^^^^^^^ extended_pt_regs points here
>>>
>>> C frame
>>> more C stuff
>>> ...
>>>
>>> IOW we have:
>>>
>>> struct extended_pt_regs {
>>> bool rcu_whatever;
>>> other generic fields here;
>>> struct arch_extended_pt_regs arch_regs;
>>> struct pt_regs regs;
>>> };
>>>
>>> and arch_extended_pt_regs has unsigned long pks;
>>>
>>> and instead of passing a pointer to irqentry_state_t to the generic
>>> entry/exit code, we just pass a pt_regs pointer.
>>
>> While I agree vs. PKS which is architecture specific state and needed in
>> other places e.g. #PF, I'm not convinced that sticking the existing
>> state into the same area buys us anything more than an indirect access.
>>
>> Peter?
>
> Agreed; that immediately solves the confusion Ira had as well. While
> extending pt_regs sounds scary, I think we've isolated our pt_regs
> implementation from actual ABI pretty well, but of course, that would
> need an audit. We don't want to leak this into signals for example.
>
I’m okay with this.
My suggestion for having an extended pt_regs that contains pt_regs is to keep extensions like this invisible to unsuspecting parts of the kernel. In particular, BPF seems to pass around struct pt_regs *, and I don’t know what the implications of effectively offsetting all the registers relative to the pointer would be.
Anything that actually broke the signal regs ABI should be noticed by the x86 selftests — the tests read and write registers through ucontext.
>
Powered by blists - more mailing lists