linux-kernel - Re: [PATCH][RF C/T/D] Unmapped page cache control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B9F5556.7060103@redhat.com>
Date:	Tue, 16 Mar 2010 10:54:30 +0100
From:	Kevin Wolf <kwolf@...hat.com>
To:	Avi Kivity <avi@...hat.com>
CC:	Chris Webb <chris@...chsys.com>, balbir@...ux.vnet.ibm.com,
	KVM development list <kvm@...r.kernel.org>,
	Rik van Riel <riel@...riel.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Christoph Hellwig <hch@....de>
Subject: Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter

Am 16.03.2010 10:17, schrieb Avi Kivity:
> On 03/15/2010 10:23 PM, Chris Webb wrote:
>> Avi Kivity<avi@...hat.com>  writes:
>>
>>    
>>> On 03/15/2010 10:07 AM, Balbir Singh wrote:
>>>
>>>      
>>>> Yes, it is a virtio call away, but is the cost of paying twice in
>>>> terms of memory acceptable?
>>>>        
>>> Usually, it isn't, which is why I recommend cache=off.
>>>      
>> Hi Avi. One observation about your recommendation for cache=none:
>>
>> We run hosts of VMs accessing drives backed by logical volumes carved out
>> from md RAID1. Each host has 32GB RAM and eight cores, divided between (say)
>> twenty virtual machines, which pretty much fill the available memory on the
>> host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback
>> caching turned on get advertised to the guest as having a write-cache, and
>> FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback
>> isn't acting as cache=neverflush like it would have done a year ago. I know
>> that comparing performance for cache=none against that unsafe behaviour
>> would be somewhat unfair!)
>>
>> Wasteful duplication of page cache between guest and host notwithstanding,
>> turning on cache=writeback is a spectacular performance win for our guests.
>> For example, even IDE with cache=writeback easily beats virtio with
>> cache=none in most of the guest filesystem performance tests I've tried. The
>> anecdotal feedback from clients is also very strongly in favour of
>> cache=writeback.
>>    
> 
> Is this with qcow2, raw file, or direct volume access?
> 
> I can understand it for qcow2, but for direct volume access this 
> shouldn't happen.  The guest schedules as many writes as it can, 
> followed by a sync.  The host (and disk) can then reschedule them 
> whether they are in the writeback cache or in the block layer, and must 
> sync in the same way once completed.
> 
> Perhaps what we need is bdrv_aio_submit() which can take a number of 
> requests.  For direct volume access, this allows easier reordering 
> (io_submit() should plug the queues before it starts processing and 
> unplug them when done, though I don't see the code for this?).  For 
> qcow2, we can coalesce metadata updates for multiple requests into one 
> RMW (for example, a sequential write split into multiple 64K-256K write 
> requests).

We already do merge sequential writes back into one larger request. So
this is in fact a case that wouldn't benefit from such changes. It may
help for other cases. But even if it did, coalescing metadata writes in
qcow2 sounds like a good way to mess up, so I'd stay with doing it only
for the data itself.

Apart from that, wouldn't your points apply to writeback as well?

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/