linux-kernel - Re: [PATCH] KVM: x86: fix vcpu->mmio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <508531E1.2030307@siemens.com>
Date:	Mon, 22 Oct 2012 13:45:37 +0200
From:	Jan Kiszka <jan.kiszka@...mens.com>
To:	Gleb Natapov <gleb@...hat.com>
CC:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>,
	Avi Kivity <avi@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>, KVM <kvm@...r.kernel.org>
Subject: Re: [PATCH] KVM: x86: fix vcpu->mmio_fragments overflow

On 2012-10-22 13:43, Gleb Natapov wrote:
> On Mon, Oct 22, 2012 at 01:35:56PM +0200, Jan Kiszka wrote:
>> On 2012-10-22 13:23, Gleb Natapov wrote:
>>> On Mon, Oct 22, 2012 at 07:09:38PM +0800, Xiao Guangrong wrote:
>>>> On 10/22/2012 05:16 PM, Gleb Natapov wrote:
>>>>> On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
>>>>>> After commit b3356bf0dbb349 (KVM: emulator: optimize "rep ins" handling),
>>>>>> the pieces of io data can be collected and write them to the guest memory
>>>>>> or MMIO together.
>>>>>>
>>>>>> Unfortunately, kvm splits the mmio access into 8 bytes and store them to
>>>>>> vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
>>>>>> will cause vcpu->mmio_fragments overflow
>>>>>>
>>>>>> The bug can be exposed by isapc (-M isapc):
>>>>>>
>>>>>> [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>>>> [ ......]
>>>>>> [23154.858083] Call Trace:
>>>>>> [23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
>>>>>> [23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
>>>>>> [23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
>>>>>>
>>>>>>
>>>>>> Actually, we can use one mmio_fragment to store a large mmio access for the
>>>>>> mmio access is always continuous then split it when we pass the mmio-exit-info
>>>>>> to userspace. After that, we only need two entries to store mmio info for
>>>>>> the cross-mmio pages access
>>>>>>
>>>>> I wonder can we put the data into coalesced mmio buffer instead of
>>>>
>>>> If we put all mmio data into coalesced buffer, we should:
>>>> - ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
>>>>   all mmio regions.
>>>>
>>> It appears to not be so.
>>> Userspace calls kvm_flush_coalesced_mmio_buffer() after returning from
>>> KVM_RUN which looks like this:
>>
>> Nope, no longer, only on accesses to devices that actually use such
>> regions (and there are only two ATM). The current design of a global
>> coalesced mmio ring is horrible /wrt latency.
>>
> Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
> is gone. So this will break new userspace, not old. By global you mean
> shared between devices (or memory regions)?

Yes. We only have a single ring per VM, so we cannot flush multi-second
VGA access separately from other devices. In theory solvable by
introducing per-region rings that can be driven separately.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/