linux-kernel - Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b427cf12-2ff6-e5cd-fe6a-3874d8622a29@redhat.com>
Date:   Tue, 10 Aug 2021 11:02:14 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Yongji Xie <xieyongji@...edance.com>,
        Robin Murphy <robin.murphy@....com>
Cc:     kvm <kvm@...r.kernel.org>, "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization <virtualization@...ts.linux-foundation.org>,
        Christian Brauner <christian.brauner@...onical.com>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Christoph Hellwig <hch@...radead.org>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Stefano Garzarella <sgarzare@...hat.com>,
        Liu Xiaodong <xiaodong.liu@...el.com>,
        Joe Perches <joe@...ches.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        songmuchun@...edance.com, Jens Axboe <axboe@...nel.dk>,
        He Zhe <zhe.he@...driver.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        iommu@...ts.linux-foundation.org, bcrl@...ck.org,
        netdev@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Mika Penttilä <mika.penttila@...tfour.com>
Subject: Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and
 free_iova_fast()


在 2021/8/9 下午1:56, Yongji Xie 写道:
> On Thu, Aug 5, 2021 at 9:31 PM Jason Wang <jasowang@...hat.com> wrote:
>>
>> 在 2021/8/5 下午8:34, Yongji Xie 写道:
>>>> My main point, though, is that if you've already got something else
>>>> keeping track of the actual addresses, then the way you're using an
>>>> iova_domain appears to be something you could do with a trivial bitmap
>>>> allocator. That's why I don't buy the efficiency argument. The main
>>>> design points of the IOVA allocator are to manage large address spaces
>>>> while trying to maximise spatial locality to minimise the underlying
>>>> pagetable usage, and allocating with a flexible limit to support
>>>> multiple devices with different addressing capabilities in the same
>>>> address space. If none of those aspects are relevant to the use-case -
>>>> which AFAICS appears to be true here - then as a general-purpose
>>>> resource allocator it's rubbish and has an unreasonably massive memory
>>>> overhead and there are many, many better choices.
>>>>
>>> OK, I get your point. Actually we used the genpool allocator in the
>>> early version. Maybe we can fall back to using it.
>>
>> I think maybe you can share some perf numbers to see how much
>> alloc_iova_fast() can help.
>>
> I did some fio tests[1] with a ram-backend vduse block device[2].
>
> Following are some performance data:
>
>                              numjobs=1   numjobs=2    numjobs=4   numjobs=8
> iova_alloc_fast    145k iops      265k iops      514k iops      758k iops
>
> iova_alloc            137k iops     170k iops      128k iops      113k iops
>
> gen_pool_alloc   143k iops      270k iops      458k iops      521k iops
>
> The iova_alloc_fast() has the best performance since we always hit the
> per-cpu cache. Regardless of the per-cpu cache, the genpool allocator
> should be better than the iova allocator.


I think we see convincing numbers for using iova_alloc_fast() than the 
gen_poll_alloc() (45% improvement on job=8).

Thanks


>
> [1] fio jobfile:
>
> [global]
> rw=randread
> direct=1
> ioengine=libaio
> iodepth=16
> time_based=1
> runtime=60s
> group_reporting
> bs=4k
> filename=/dev/vda
> [job]
> numjobs=..
>
> [2]  $ qemu-storage-daemon \
>        --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>        --monitor chardev=charmonitor \
>        --blockdev
> driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
> \
>        --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
>
> The qemu-storage-daemon can be builded based on the repo:
> https://github.com/bytedance/qemu/tree/vduse-test.
>
> Thanks,
> Yongji
>