linux-kernel - Re: [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220110115111.nmtahu5afhdgmc7z@liuwe-devbox-debian-v2>
Date:   Mon, 10 Jan 2022 11:51:11 +0000
From:   Wei Liu <wei.liu@...nel.org>
To:     "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
Cc:     vkuznets <vkuznets@...hat.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/hyperv: Properly deal with empty cpumasks in
 hyperv_flush_tlb_multi()

On Fri, Jan 07, 2022 at 04:56:29PM +0000, Michael Kelley (LINUX) wrote:
> From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Thursday, January 6, 2022 1:46 AM
> > 
> > KASAN detected the following issue:
> > 
> >  BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
> >  Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33
> > 
> >  CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
> >  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> >      BIOS Hyper-V UEFI Release v4.0 12/17/2019
> >  Call Trace:
> >   dump_stack_lvl+0x57/0x7d
> >   print_address_description.constprop.0+0x1f/0x140
> >   ? hyperv_flush_tlb_multi+0xf88/0x1060
> >   __kasan_report.cold+0x7f/0x11e
> >   ? hyperv_flush_tlb_multi+0xf88/0x1060
> >   kasan_report+0x38/0x50
> >   hyperv_flush_tlb_multi+0xf88/0x1060
> >   flush_tlb_mm_range+0x1b1/0x200
> >   ptep_clear_flush+0x10e/0x150
> > ...
> >  Allocated by task 0:
> >   kasan_save_stack+0x1b/0x40
> >   __kasan_kmalloc+0x7c/0x90
> >   hv_common_init+0xae/0x115
> >   hyperv_init+0x97/0x501
> >   apic_intr_mode_init+0xb3/0x1e0
> >   x86_late_time_init+0x92/0xa2
> >   start_kernel+0x338/0x3eb
> >   secondary_startup_64_no_verify+0xc2/0xcb
> > 
> >  The buggy address belongs to the object at ffff8880011cc800
> >   which belongs to the cache kmalloc-1k of size 1024
> >  The buggy address is located 960 bytes inside of
> >   1024-byte region [ffff8880011cc800, ffff8880011ccc00)
> > 
> > 'hyperv_flush_tlb_multi+0xf88/0x1060' points to
> > hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
> > VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
> > access past hv_vp_index's last element. This can (and will) happen
> > when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.
> > 
> > Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
> > been disabled") tried to deal with empty cpumask situation but
> > apparently didn't fully fix the issue.
> > 
> > 'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
> > 'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
> > time the particular mm is scheduled/unscheduled on some CPU (see
> > switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
> > TLB flush has zero influence on whether the particular process can get
> > scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
> > scheduled on one other CPU and got unscheduled during
> > hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.
> > 
> > It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
> > from changing during hyperv_flush_tlb_multi()'s execution. It would be
> > possible to copy it in the very beginning of the function but this is a
> > waste. It seems we can deal with changing cpumask just fine.
> > 
> > When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
> > execution, there are two possible issues:
> > - 'Under-flushing': we will not flush TLB on a CPU which got added to
> > the mask while hyperv_flush_tlb_multi() was already running. This is
> > not a problem as this is equal to mm getting scheduled on that CPU
> > right after TLB flush.
> > - 'Over-flushing': we may flush TLB on a CPU which is already cleared
> > from the mask. First, extra TLB flush preserves correctness. Second,
> > Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
> > avoid the flush if CR3 doesn't match.
> > 
> > Fix the immediate issue with
> > cpumask_last()/hv_cpu_number_to_vp_number()
> > and remove the pointless cpumask_empty() check from the beginning of the
> > function as it really doesn't protect anything. Also, avoid the hypercall
> > altogether when 'flush->processor_mask' ends up being empty.
> > 
> > Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> > ---
> >  arch/x86/hyperv/mmu.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> > 
[...]
> 
> Reviewed-by: Michael Kelley <mikelley@...rosoft.com>

Applied to hyperv-next. Thanks.