linux-kernel - Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45c64b49-a38b-4b0c-d9cf-6c586dacbcc9@arm.com>
Date:   Mon, 26 Oct 2020 17:39:42 -0500
From:   Jeremy Linton <jeremy.linton@....com>
To:     Dave Martin <Dave.Martin@....com>,
        Szabolcs Nagy <szabolcs.nagy@....com>
Cc:     Mark Rutland <mark.rutland@....com>,
        systemd-devel@...ts.freedesktop.org,
        Kees Cook <keescook@...omium.org>,
        Catalin Marinas <Catalin.Marinas@....com>,
        Will Deacon <will.deacon@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Mark Brown <broonie@...nel.org>, toiwoton@...il.com,
        libc-alpha@...rceware.org,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: BTI interaction between seccomp filters in systemd and glibc
 mprotect calls, causing service failures

Hi,

On 10/26/20 12:52 PM, Dave Martin wrote:
> On Mon, Oct 26, 2020 at 04:57:55PM +0000, Szabolcs Nagy via Libc-alpha wrote:
>> The 10/26/2020 16:24, Dave Martin via Libc-alpha wrote:
>>> Unrolling this discussion a bit, this problem comes from a few sources:
>>>
>>> 1) systemd is trying to implement a policy that doesn't fit SECCOMP
>>> syscall filtering very well.
>>>
>>> 2) The program is trying to do something not expressible through the
>>> syscall interface: really the intent is to set PROT_BTI on the page,
>>> with no intent to set PROT_EXEC on any page that didn't already have it
>>> set.
>>>
>>>
>>> This limitation of mprotect() was known when I originally added PROT_BTI,
>>> but at that time we weren't aware of a clear use case that would fail.
>>>
>>>
>>> Would it now help to add something like:
>>>
>>> int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
>>> {
>>> 	int ret = -EINVAL;
>>> 	mmap_write_lock(current->mm);
>>> 	if (all vmas in [addr .. addr + len) have
>>> 			their mprotect flags set to old_flags) {
>>>
>>> 		ret = mprotect(addr, len, new_flags);
>>> 	}
>>> 	
>>> 	mmap_write_unlock(current->mm);
>>> 	return ret;
>>> }
>>
>> if more prot flags are introduced then the exact
>> match for old_flags may be restrictive and currently
>> there is no way to query these flags to figure out
>> how to toggle one prot flag in a future proof way,
>> so i don't think this solves the issue completely.
> 
> Ack -- I illustrated this model because it makes the seccomp filter's
> job easy, but it does have limitations.
> 
>> i think we might need a new api, given that aarch64
>> now has PROT_BTI and PROT_MTE while existing code
>> expects RWX only, but i don't know what api is best.
> 
> An alternative option would be a call that sets / clears chosen
> flags and leaves others unchanged.

I tend to favor a set/clear API, but that could also just be done by 
creating a new PROT_BTI_IF_X which enables BTI for areas already set to 
_EXEC. That goes right by the seccomp filters too, and actually is 
closer to what glibc wants to do anyway.


> 
> The trouble with that is that the MDWX policy then becomes hard to
> implement again.
> 
> 
> But policies might be best set via another route, such as a prctl,
> rather than being implemented completely in a seccomp filter.
> 
> Cheers
> ---Dave
>