lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171006013349.bgwpd3djojyxn7al@tardis>
Date:   Fri, 6 Oct 2017 09:33:49 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] kvm/x86: Avoid async PF to end RCU read-side critical
 section early in PREEMPT=n kernel

On Tue, Oct 03, 2017 at 02:11:08PM +0000, Paolo Bonzini wrote:
> I'd prefer a slight change in subject and topic:
> 
> ------- 8< --------
> Subject: [PATCH] kvm/x86: Avoid async PF preempting the kernel incorrectly
> 
> Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
> schedule() to reschedule in some cases.  This could result in
> accidentally ending the current RCU read-side critical section early,
> causing random memory corruption in the guest, or otherwise preempting
> the currently running task inside between preempt_disable and
> preempt_enable.
> 
> The difficulty to handle this well is because we don't know whether an
> async PF delivered in a preemptible section or RCU read-side critical section
> for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
> are both no-ops in that case.
> 
> To cure this, we treat any async PF interrupting a kernel context as one
> that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
> the schedule() path in that case.
> 
> To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
> so that we know whether it's called from a context interrupting the
> kernel, and the parameter is set properly in all the callsites.
> 
> Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Wanpeng Li <wanpeng.li@...mail.com>
> Cc: stable@...r.kernel.org
> Signed-off-by: Boqun Feng <boqun.feng@...il.com>
> ------- 8< --------
> 

It's more concise and accurate now!

Learned a lot from your modification of commit messages, thanks!

Regards,
Boqun

> I'll let Radim pick up the patch.
> 
> Paolo
> 
> On 03/10/2017 15:36, Boqun Feng wrote:
> > Currently, in PREEMPT=n kernel, kvm_async_pf_task_wait() could call
> > schedule() to reschedule in some cases, which could result in
> > accidentally ending the current RCU read-side critical section early.
> > And this could end up with random memory corruption in the guest.
> > 
> > The difficulty to handle this well is because we don't know whether an
> > async PF delivered in a RCU read-side critical section for
> > PREEMPT_COUNT=n kernel, since rcu_read_lock/unlock() are just no-ops in
> > that case.
> > 
> > To cure this, we treat any async PF interrupting a kernel context as one
> > delivered in a RCU read-side critical section, and we don't allow
> > kvm_async_pf_task_wait() to choose schedule path in that case for
> > PREEMPT_COUNT=n kernel, because that will introduce unvolunteerly
> > context switches and break the assumption for RCU to work properly.
> > 
> > To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
> > so that we know whether it's called from a context interrupting the
> > kernel, and we set that parameter properly in all the callsites.
> > 
> > Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: Wanpeng Li <wanpeng.li@...mail.com>
> > Signed-off-by: Boqun Feng <boqun.feng@...il.com>
> > ---
> >  arch/x86/include/asm/kvm_para.h |  4 ++--
> >  arch/x86/kernel/kvm.c           | 14 ++++++++++----
> >  arch/x86/kvm/mmu.c              |  2 +-
> >  3 files changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index bc62e7cbf1b1..59ad3d132353 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -88,7 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> >  bool kvm_para_available(void);
> >  unsigned int kvm_arch_para_features(void);
> >  void __init kvm_guest_init(void);
> > -void kvm_async_pf_task_wait(u32 token);
> > +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
> >  void kvm_async_pf_task_wake(u32 token);
> >  u32 kvm_read_and_reset_pf_reason(void);
> >  extern void kvm_disable_steal_time(void);
> > @@ -103,7 +103,7 @@ static inline void kvm_spinlock_init(void)
> >  
> >  #else /* CONFIG_KVM_GUEST */
> >  #define kvm_guest_init() do {} while (0)
> > -#define kvm_async_pf_task_wait(T) do {} while(0)
> > +#define kvm_async_pf_task_wait(T, I) do {} while(0)
> >  #define kvm_async_pf_task_wake(T) do {} while(0)
> >  
> >  static inline bool kvm_para_available(void)
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index e675704fa6f7..8bb9594d0761 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -117,7 +117,11 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
> >  	return NULL;
> >  }
> >  
> > -void kvm_async_pf_task_wait(u32 token)
> > +/*
> > + * @interrupt_kernel: Is this called from a routine which interrupts the kernel
> > + * 		      (other than user space)?
> > + */
> > +void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
> >  {
> >  	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
> >  	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
> > @@ -140,8 +144,10 @@ void kvm_async_pf_task_wait(u32 token)
> >  
> >  	n.token = token;
> >  	n.cpu = smp_processor_id();
> > -	n.halted = is_idle_task(current) || preempt_count() > 1 ||
> > -		   rcu_preempt_depth();
> > +	n.halted = is_idle_task(current) ||
> > +		   (IS_ENABLED(CONFIG_PREEMPT_COUNT)
> > +		    ? preempt_count() > 1 || rcu_preempt_depth()
> > +		    : interrupt_kernel);
> >  	init_swait_queue_head(&n.wq);
> >  	hlist_add_head(&n.link, &b->list);
> >  	raw_spin_unlock(&b->lock);
> > @@ -269,7 +275,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >  		/* page is swapped out by the host. */
> >  		prev_state = exception_enter();
> > -		kvm_async_pf_task_wait((u32)read_cr2());
> > +		kvm_async_pf_task_wait((u32)read_cr2(), !user_mode(regs));
> >  		exception_exit(prev_state);
> >  		break;
> >  	case KVM_PV_REASON_PAGE_READY:
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index eca30c1eb1d9..106d4a029a8a 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -3837,7 +3837,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >  		vcpu->arch.apf.host_apf_reason = 0;
> >  		local_irq_disable();
> > -		kvm_async_pf_task_wait(fault_address);
> > +		kvm_async_pf_task_wait(fault_address, 0);
> >  		local_irq_enable();
> >  		break;
> >  	case KVM_PV_REASON_PAGE_READY:
> > 
> 

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ