linux-kernel - Re: I915 CI-run with kfence enabled, issues found

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <796ff05e-c137-cbd4-252b-7b114abaced9@intel.com>
Date:   Mon, 29 Mar 2021 10:32:03 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Marco Elver <elver@...gle.com>,
        "Sarvela, Tomi P" <tomi.p.sarvela@...el.com>
Cc:     "kasan-dev@...glegroups.com" <kasan-dev@...glegroups.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        linux-kernel@...r.kernel.org
Subject: Re: I915 CI-run with kfence enabled, issues found

On 3/29/21 9:40 AM, Marco Elver wrote:
> It looks like the code path from flush_tlb_one_kernel() to
> invalidate_user_asid()'s this_cpu_ptr() has several feature checks, so
> probably some feature difference between systems where it triggers and
> it doesn't.
> 
> As far as I'm aware, there is no restriction on where
> flush_tlb_one_kernel() is called. We could of course guard it but I
> think that's wrong.
> 
> Other than that, I hope the x86 maintainers know what's going on here.
> 
> Just for reference, the stack traces in the above logs start with:
> 
> | <3> [31.556004] BUG: using smp_processor_id() in preemptible [00000000] code: dmesg/1075
> | <4> [31.556070] caller is invalidate_user_asid+0x13/0x50
> | <4> [31.556078] CPU: 6 PID: 1075 Comm: dmesg Not tainted 5.12.0-rc4-gda4a2b1a5479-kfence_1+ #1
> | <4> [31.556081] Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.11 10/24/2012
> | <4> [31.556084] Call Trace:
> | <4> [31.556088]  dump_stack+0x7f/0xad
> | <4> [31.556097]  check_preemption_disabled+0xc8/0xd0
> | <4> [31.556104]  invalidate_user_asid+0x13/0x50
> | <4> [31.556109]  flush_tlb_one_kernel+0x5/0x20
> | <4> [31.556113]  kfence_protect+0x56/0x80
> | 	...........

Our naming here isn't great.

But, the "one" in flush_tlb_one_kernel() really refers to two "ones":
1. Flush one single address
2. Flush that address from one CPU's TLB

The reason preempt needs to be off is that it doesn't make any sense to
flush one TLB entry from a "random" CPU.  It only makes sense to flush
it when preempt is disabled and you *know* which CPU's TLB you're flushing.

I think kfence needs to be using flush_tlb_kernel_range().  That does
all the IPI fanciness to flush the TLBs on *ALL* CPUs, not just the
current one.

BTW, the preempt checks in flush_tlb_one_kernel() are dependent on KPTI
being enabled.  That's probably why you don't see this everywhere.  We
should probably have unconditional preempt checks in there.