linux-kernel - Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <E68459DD-53D3-42A6-B120-180203791E24@amacapital.net>
Date:   Sat, 15 Jun 2019 08:30:08 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Yu-cheng Yu <yu-cheng.yu@...el.com>,
        Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-mm@...ck.org,
        linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
        Arnd Bergmann <arnd@...db.de>,
        Balbir Singh <bsingharora@...il.com>,
        Borislav Petkov <bp@...en8.de>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Eugene Syromiatnikov <esyr@...hat.com>,
        Florian Weimer <fweimer@...hat.com>,
        "H.J. Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        Kees Cook <keescook@...omium.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
        Randy Dunlap <rdunlap@...radead.org>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Vedvyas Shanbhogue <vedvyas.shanbhogue@...el.com>,
        Dave Martin <Dave.Martin@....com>
Subject: Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function



> On Jun 14, 2019, at 3:06 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
>> On 6/14/19 2:34 PM, Yu-cheng Yu wrote:
>> On Fri, 2019-06-14 at 13:57 -0700, Dave Hansen wrote:
>>>> I have a related question:
>>>> 
>>>> Do we allow the application to read the bitmap, or any fault from the
>>>> application on bitmap pages?
>>> 
>>> We have to allow apps to read it.  Otherwise they can't execute
>>> instructions.
>> 
>> What I meant was, if an app executes some legacy code that results in bitmap
>> lookup, but the bitmap page is not yet populated, and if we then populate that
>> page with all-zero, a #CP should follow.  So do we even populate that zero page
>> at all?
>> 
>> I think we should; a #CP is more obvious to the user at least.
> 
> Please make an effort to un-Intel-ificate your messages as much as
> possible.  I'd really prefer that folks say "missing end branch fault"
> rather than #CP.  I had to Google "#CP".
> 
> I *think* you are saying that:  The *only* lookups to this bitmap are on
> "missing end branch" conditions.  Normal, proper-functioning code
> execution that has ENDBR instructions in it will never even look at the
> bitmap.  The only case when we reference the bitmap locations is when
> the processor is about do do a "missing end branch fault" so that it can
> be suppressed.  Any population with the zero page would be done when
> code had already encountered a "missing end branch" condition, and
> populating with a zero-filled page will guarantee that a "missing end
> branch fault" will result.  You're arguing that we should just figure
> this out at fault time and not ever reach the "missing end branch fault"
> at all.
> 
> Is that right?
> 
> If so, that's an architecture subtlety that I missed until now and which
> went entirely unmentioned in the changelog and discussion up to this
> point.  Let's make sure that nobody else has to walk that path by
> improving our changelog, please.
> 
> In any case, I don't think this is worth special-casing our zero-fill
> code, FWIW.  It's not performance critical and not worth the complexity.
> If apps want to handle the signals and abuse this to fill space up with
> boring page table contents, they're welcome to.  There are much easier
> ways to consume a lot of memory.

Isn’t it a special case either way?  Either we look at CR2 and populate a page, or we look at CR2 and the “tracker” state and send a different signal.  Admittedly the former is very common in the kernel.

> 
>>> We don't have to allow them to (popuating) fault on it.  But, if we
>>> don't, we need some kind of kernel interface to avoid the faults.
>> 
>> The plan is:
>> 
>> * Move STACK_TOP (and vdso) down to give space to the bitmap.
> 
> Even for apps with 57-bit address spaces?
> 
>> * Reserve the bitmap space from (mm->start_stack + PAGE_SIZE) to cover a code
>> size of TASK_SIZE_LOW, which is (TASK_SIZE_LOW / PAGE_SIZE / 8).
> 
> The bitmap size is determined by CR4.LA57, not the app.  If you place
> the bitmap here, won't references to it for high addresses go into the
> high address space?
> 
> Specifically, on a CR4.LA57=0 system, we have 48 bits of address space,
> so 128TB for apps.  You are proposing sticking the bitmap above the
> stack which is near the top of that 128TB address space.  But on a
> 5-level paging system with CR4.LA57=1, there could be valid data at
> 129GB.  Is there something keeping that data from being mistaken for
> being part of the bitmap?
> 

I think we need to make the vma be full sized — it should cover the entire range that the CPU might access. If that means it spans the 48-bit boundary, so be it.

> Also, if you're limiting it to TASK_SIZE_LOW, please don't forget that
> this is yet another thing that probably won't work with the vsyscall
> page.  Please make sure you consider it and mention it in your next post.

Why not?  The vsyscall page is at a negative address.

> 
>> * Mmap the space only when the app issues the first mark-legacy prctl.  This
>> avoids the core-dump issue for most apps and the accounting problem that
>> MAP_NORESERVE probably won't solve

What happens if there’s another VMA there by the time you map it?