linux-kernel - Re: [PATCH] KVM: SEV: Add cond_resched() to loop in sev_clflush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMkAt6p+u8ggkipnp1x=iv4xLD75b+eYEO3cCxP7f30PYPW0YQ@mail.gmail.com>
Date:   Tue, 5 Apr 2022 13:26:09 -0600
From:   Peter Gonda <pgonda@...gle.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Mingwei Zhang <mizhang@...gle.com>, kvm <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: SEV: Add cond_resched() to loop in sev_clflush_pages()

On Thu, Mar 31, 2022 at 3:31 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Wed, Mar 30, 2022, Mingwei Zhang wrote:
> > On Wed, Mar 30, 2022 at 9:43 AM Peter Gonda <pgonda@...gle.com> wrote:
> > >
> > > Add resched to avoid warning from sev_clflush_pages() with large number
> > > of pages.
> > >
> > > Signed-off-by: Peter Gonda <pgonda@...gle.com>
> > > Cc: Sean Christopherson <seanjc@...gle.com>
> > > Cc: kvm@...r.kernel.org
> > > Cc: linux-kernel@...r.kernel.org
> > >
> > > ---
> > > Here is a warning similar to what I've seen many times running large SEV
> > > VMs:
> > > [  357.714051] CPU 15: need_resched set for > 52000222 ns (52 ticks) without schedule
> > > [  357.721623] WARNING: CPU: 15 PID: 35848 at kernel/sched/core.c:3733 scheduler_tick+0x2f9/0x3f0
> > > [  357.730222] Modules linked in: kvm_amd uhaul vfat fat hdi2_standard_ftl hdi2_megablocks hdi2_pmc hdi2_pmc_eeprom hdi2 stg elephant_dev_num ccp i2c_mux_ltc4306 i2c_mux i2c_via_ipmi i2c_piix4 google_bmc_usb google_bmc_gpioi2c_mb_common google_bmc_mailbox cdc_acm xhci_pci xhci_hcd sha3_generic gq nv_p2p_glue accel_class
> > > [  357.758261] CPU: 15 PID: 35848 Comm: switchto-defaul Not tainted 4.15.0-smp-DEV #11
> > > [  357.765912] Hardware name: Google, Inc.                                                       Arcadia_IT_80/Arcadia_IT_80, BIOS 30.20.2-gce 11/05/2021
> > > [  357.779372] RIP: 0010:scheduler_tick+0x2f9/0x3f0
> > > [  357.783988] RSP: 0018:ffff98558d1c3dd8 EFLAGS: 00010046
> > > [  357.789207] RAX: 741f23206aa8dc00 RBX: 0000005349236a42 RCX: 0000000000000007
> > > [  357.796339] RDX: 0000000000000006 RSI: 0000000000000002 RDI: ffff98558d1d5a98
> > > [  357.803463] RBP: ffff98558d1c3ea0 R08: 0000000000100ceb R09: 0000000000000000
> > > [  357.810597] R10: ffff98558c958c00 R11: ffffffff94850740 R12: 00000000031975de
> > > [  357.817729] R13: 0000000000000000 R14: ffff98558d1e2640 R15: ffff98525739ea40
> > > [  357.824862] FS:  00007f87503eb700(0000) GS:ffff98558d1c0000(0000) knlGS:0000000000000000
> > > [  357.832948] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  357.838695] CR2: 00005572fe74b080 CR3: 0000007bea706006 CR4: 0000000000360ef0
> > > [  357.845828] Call Trace:
> > > [  357.848277]  <IRQ>
> > > [  357.850294]  [<ffffffff94411420>] ? tick_setup_sched_timer+0x130/0x130
> > > [  357.856818]  [<ffffffff943ed60d>] ? rcu_sched_clock_irq+0x6ed/0x850
> > > [  357.863084]  [<ffffffff943fdf02>] ? __run_timers+0x42/0x260
> > > [  357.868654]  [<ffffffff94411420>] ? tick_setup_sched_timer+0x130/0x130
> > > [  357.875182]  [<ffffffff943fd35b>] update_process_times+0x7b/0x90
> > > [  357.881188]  [<ffffffff944114a2>] tick_sched_timer+0x82/0xd0
> > > [  357.886845]  [<ffffffff94400671>] __run_hrtimer+0x81/0x200
> > > [  357.892331]  [<ffffffff943ff222>] hrtimer_interrupt+0x192/0x450
> > > [  357.898252]  [<ffffffff950002fa>] ? __do_softirq+0x2fa/0x33e
> > > [  357.903911]  [<ffffffff94e02edc>] smp_apic_timer_interrupt+0xac/0x1d0
> > > [  357.910349]  [<ffffffff94e01ef6>] apic_timer_interrupt+0x86/0x90
> > > [  357.916347]  </IRQ>
> > > [  357.918452] RIP: 0010:clflush_cache_range+0x3f/0x50
> > > [  357.923324] RSP: 0018:ffff98529af89cc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
> > > [  357.930889] RAX: 0000000000000040 RBX: 0000000000038135 RCX: ffff985233d36000
> > > [  357.938013] RDX: ffff985233d36000 RSI: 0000000000001000 RDI: ffff985233d35000
> > > [  357.945145] RBP: ffff98529af89cc0 R08: 0000000000000001 R09: ffffb5753fb23000
> > > [  357.952271] R10: 000000000003fe00 R11: 0000000000000008 R12: 0000000000040000
> > > [  357.959401] R13: ffff98525739ea40 R14: ffffb5753fb22000 R15: ffff98532a58dd80
> > > [  357.966536]  [<ffffffffc07afd41>] svm_register_enc_region+0xd1/0x170 [kvm_amd]
> > > [  357.973758]  [<ffffffff94246e8c>] kvm_arch_vm_ioctl+0x84c/0xb00
> > > [  357.979677]  [<ffffffff9455980f>] ? handle_mm_fault+0x6ff/0x1370
> > > [  357.985683]  [<ffffffff9423412b>] kvm_vm_ioctl+0x69b/0x720
> > > [  357.991167]  [<ffffffff945dfd9d>] do_vfs_ioctl+0x47d/0x680
> > > [  357.996654]  [<ffffffff945e0188>] SyS_ioctl+0x68/0x90
> > > [  358.001706]  [<ffffffff942066f1>] do_syscall_64+0x71/0x110
> > > [  358.007192]  [<ffffffff94e00081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > >
> > > Tested by running a large 256gib SEV VM several times, saw no warnings.
> > > Without the change warnings are seen.
>
> Clean up the splat (remove timestamps, everything with a ?, etc... I believe there
> is a kernel scripts/ to do this...) and throw it in the changelog.  Documenting the
> exact problem is very helpful, e.g. future readers may wonder "what warning?".

Paolo has queued this I think, so I'll do this next time I am fixing a
warning.  Thanks Sean.

>
>
> > > ---
> > >  arch/x86/kvm/svm/sev.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 75fa6dd268f0..c2fe89ecdb2d 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -465,6 +465,7 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
> > >                 page_virtual = kmap_atomic(pages[i]);
> > >                 clflush_cache_range(page_virtual, PAGE_SIZE);
> > >                 kunmap_atomic(page_virtual);
> > > +               cond_resched();
> >
> > If you add cond_resched() here, the frequency (once per 4K) might be
> > too high. You may want to do it once per X pages, where X could be
> > something like 1G/4K?
>
> No, every iteration is perfectly ok.  The "cond"itional part means that this will
> reschedule if and only if it actually needs to be rescheduled, e.g. if the task's
> timeslice as expired.  The check for a needed reschedule is cheap, using
> cond_resched() in tight-ish loops is ok and intended, e.g. KVM does a reched
> check prior to enterring the guest.