lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CAD6C3B4-DD55-47EB-9BC0-17867937AE2D@gmail.com>
Date:	Tue, 8 Oct 2013 12:02:32 +0800
From:	Xiao Guangrong <xiaoguangrong.eric@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>, gleb@...hat.com,
	avi.kivity@...il.com, pbonzini@...hat.com,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread


Hi Marcelo,

On Oct 8, 2013, at 9:23 AM, Marcelo Tosatti <mtosatti@...hat.com> wrote:

>> 
>> +	if (kvm->arch.rcu_free_shadow_page) {
>> +		kvm_mmu_isolate_pages(invalid_list);
>> +		sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
>> +		list_del_init(invalid_list);
>> +		call_rcu(&sp->rcu, free_pages_rcu);
>> +		return;
>> +	}
> 
> This is unbounded (there was a similar problem with early fast page fault
> implementations):
> 
> From RCU/checklist.txt:
> 
> "        An especially important property of the synchronize_rcu()
>        primitive is that it automatically self-limits: if grace periods
>        are delayed for whatever reason, then the synchronize_rcu()
>        primitive will correspondingly delay updates.  In contrast,
>        code using call_rcu() should explicitly limit update rate in
>        cases where grace periods are delayed, as failing to do so can
>        result in excessive realtime latencies or even OOM conditions.
> "

I understand what you are worrying about… Hmm, can it be avoided by
just using kvm->arch.rcu_free_shadow_page in a small window? - Then
there are slight chance that the page need to be freed by call_rcu.

> 
> Moreover, freeing pages differently depending on some state should 
> be avoided.
> 
> Alternatives:
> 
> - Disable interrupts at write protect sites.

The write-protection can be triggered by KVM ioctl that is not in the VCPU
context, if we do this, we also need to send IPI to the KVM thread when do
TLB flush. And we can not do much work while interrupt is disabled due to
interrupt latency.

> - Rate limit the number of pages freed via call_rcu
> per grace period.

Seems complex. :(

> - Some better alternative.

Gleb has a idea that uses RCU_DESTORY to protect the shadow page table
and encodes the page-level into the spte (since we need to check if the spte
is the last-spte. ).  How about this?

I planned to do it after this patchset merged, if you like it and if you think
that "using kvm->arch.rcu_free_shadow_page in a small window" can not avoid
the issue, i am happy to do it in the next version. :)

Thanks, Marcelo!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ