linux-kernel - Re: [PATCH][RF C/T/D] Unmapped page cache control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B9F4CBD.3020805@redhat.com>
Date:	Tue, 16 Mar 2010 11:17:49 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Chris Webb <chris@...chsys.com>
CC:	balbir@...ux.vnet.ibm.com,
	KVM development list <kvm@...r.kernel.org>,
	Rik van Riel <riel@...riel.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Christoph Hellwig <hch@....de>, Kevin Wolf <kwolf@...hat.com>
Subject: Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

On 03/15/2010 10:23 PM, Chris Webb wrote:
> Avi Kivity<avi@...hat.com>  writes:
>
>    
>> On 03/15/2010 10:07 AM, Balbir Singh wrote:
>>
>>      
>>> Yes, it is a virtio call away, but is the cost of paying twice in
>>> terms of memory acceptable?
>>>        
>> Usually, it isn't, which is why I recommend cache=off.
>>      
> Hi Avi. One observation about your recommendation for cache=none:
>
> We run hosts of VMs accessing drives backed by logical volumes carved out
> from md RAID1. Each host has 32GB RAM and eight cores, divided between (say)
> twenty virtual machines, which pretty much fill the available memory on the
> host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback
> caching turned on get advertised to the guest as having a write-cache, and
> FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback
> isn't acting as cache=neverflush like it would have done a year ago. I know
> that comparing performance for cache=none against that unsafe behaviour
> would be somewhat unfair!)
>
> Wasteful duplication of page cache between guest and host notwithstanding,
> turning on cache=writeback is a spectacular performance win for our guests.
> For example, even IDE with cache=writeback easily beats virtio with
> cache=none in most of the guest filesystem performance tests I've tried. The
> anecdotal feedback from clients is also very strongly in favour of
> cache=writeback.
>    

Is this with qcow2, raw file, or direct volume access?

I can understand it for qcow2, but for direct volume access this 
shouldn't happen.  The guest schedules as many writes as it can, 
followed by a sync.  The host (and disk) can then reschedule them 
whether they are in the writeback cache or in the block layer, and must 
sync in the same way once completed.

Perhaps what we need is bdrv_aio_submit() which can take a number of 
requests.  For direct volume access, this allows easier reordering 
(io_submit() should plug the queues before it starts processing and 
unplug them when done, though I don't see the code for this?).  For 
qcow2, we can coalesce metadata updates for multiple requests into one 
RMW (for example, a sequential write split into multiple 64K-256K write 
requests).

Christoph/Kevin?

> With a host full of cache=none guests, IO contention between guests is
> hugely problematic with non-stop seek from the disks to service tiny
> O_DIRECT writes (especially without virtio), many of which needn't have been
> synchronous if only there had been some way for the guest OS to tell qemu
> that. Running with cache=writeback seems to reduce the frequency of disk
> flush per guest to a much more manageable level, and to allow the host's
> elevator to optimise writing out across the guests in between these flushes.
>    

The host eventually has to turn the writes into synchronous writes, no 
way around that.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/