[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99d905fb-0732-bb7b-f631-f08c897f1d8c@oracle.com>
Date: Tue, 15 Aug 2023 16:15:42 +0100
From: John Garry <john.g.garry@...cle.com>
To: "zhangzekun (A)" <zhangzekun11@...wei.com>
Cc: iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
baolu.lu@...ux.intel.com, robh@...nel.org, nicolinc@...dia.com,
kevin.tian@...el.com, Robin Murphy <robin.murphy@....com>,
joro@...tes.org, will@...nel.org
Subject: Re: [RESEND PATCH 2/2] iommu/iova: allocate iova_rcache->depot
dynamicly
On 12/08/2023 08:18, zhangzekun (A) wrote:
>
>
> 在 2023/8/11 22:14, John Garry 写道:
>> On 11/08/2023 14:02, Zhang Zekun wrote:
>>> In fio test with 4k,read,and allowed cpus to 0-255, we observe a
>>> performance
>>> decrease of IOPS.
> The normal IOPS
>>
>> What do you mean by normal IOPS? Describe this "normal" scenario.
>>
> Hi, John
>
> The reason why I think 1980K is normal is that I have test the iova_rache
> hit rate with all iova size, the average iova cache hit rate can reach
> up to
Sorry to say, but I still don't know what you mean by the terms "normal"
and "abnormal" here. Is "normal" prior to the drop in IOPS which you
mention, above? If so, at what time do this occur?
> around 99% (varys from diffent work loads and iova size), and I think
> iova_rcache behave well in our test work loads. Besides, the IOPS is
> behaving as expect which is acked by our test group.
>
>> ? can reach up to 1980k, but we can only
>>> get about 1600k.
>>>
>>> abormal IOPS:
>>> Jobs: 12 (f=12): [R(12)][99.3%][r=6220MiB/s][r=1592k IOPS][eta 00m:12s]
>>> Jobs: 12 (f=12): [R(12)][99.4%][r=6215MiB/s][r=1591k IOPS][eta 00m:11s]
>>> Jobs: 12 (f=12): [R(12)][99.4%][r=6335MiB/s][r=1622k IOPS][eta 00m:10s]
>>> Jobs: 12 (f=12): [R(12)][99.5%][r=6194MiB/s][r=1586k IOPS][eta 00m:09s]
>>> Jobs: 12 (f=12): [R(12)][99.6%][r=6173MiB/s][r=1580k IOPS][eta 00m:08s]
>>> Jobs: 12 (f=12): [R(12)][99.6%][r=5984MiB/s][r=1532k IOPS][eta 00m:07s]
>>> Jobs: 12 (f=12): [R(12)][99.7%][r=6374MiB/s][r=1632k IOPS][eta 00m:06s]
>>> Jobs: 12 (f=12): [R(12)][99.7%][r=6343MiB/s][r=1624k IOPS][eta 00m:05s]
>>>
>>> normal IOPS:
>>> Jobs: 12 (f=12): [R(12)][99.3%][r=7736MiB/s][r=1980k IOPS][eta 00m:12s]
>>> Jobs: 12 (f=12): [R(12)][99.4%][r=7744MiB/s][r=1982k IOPS][eta 00m:11s]
>>> Jobs: 12 (f=12): [R(12)][99.4%][r=7737MiB/s][r=1981k IOPS][eta 00m:10s]
>>> Jobs: 12 (f=12): [R(12)][99.5%][r=7735MiB/s][r=1980k IOPS][eta 00m:09s]
>>> Jobs: 12 (f=12): [R(12)][99.6%][r=7741MiB/s][r=1982k IOPS][eta 00m:08s]
>>> Jobs: 12 (f=12): [R(12)][99.6%][r=7740MiB/s][r=1982k IOPS][eta 00m:07s]
>>> Jobs: 12 (f=12): [R(12)][99.7%][r=7736MiB/s][r=1981k IOPS][eta 00m:06s]
>>> Jobs: 12 (f=12): [R(12)][99.7%][r=7736MiB/s][r=1980k IOPS][eta 00m:05s]
>>>
>>> The current struct of iova_rcache will have iova_cpu_rcache for every
>>> cpu, and these iova_cpu_rcaches use a common buffer
>>> iova_rcache->depot to
>>> exchange iovas among iova_cpu_rcaches. A machine with 256 cpus will have
>>> 256 iova_cpu_rcaches and 1 iova_rcache->depot per iova_domain. However,
>>> the max size of iova_rcache->depot is fixed to MAX_GLOBAL_MAGS which
>>> equals
>>> to 32, and can't grow with the number of cpus, and this can cause
>>> problem
>>> in some condition.
>>>
>>> Some drivers will only free iovas in their irq call back function. For
>>> the driver in this case it has 16 thread irqs to free iova, but these
>>> irq call back function will only free iovas on 16 certain cpus(cpu{0,16,
>>> 32...,240}). Thread irq which smp affinity is 0-15, will only free
>>> iova on
>>> cpu 0. However, the driver will alloc iova on all cpus(cpu{0-255}), cpus
>>> without free iova in local cpu_rcache need to get free iovas from
>>> iova_rcache->depot. The current size of iova_rcache->depot max size
>>> is 32,
>>> and it seems to be too small for 256 users (16 cpus will put iovas to
>>> iova_rcache->depot and 240 cpus will try to get iova from it). Set
>>> iova_rcache->depot grow with the num of possible cpus, and the decrease
>>> of IOPS disappear.
>>
>> Doesn't it take a long time for all the depots to fill for you? From
>> the description, this sounds like the hisilicon SAS controller which
>> you are experimenting with and I found there that it took a long time
>> for the depots to fill for high throughput testing.
>>
>> Thanks,
>> John
>
> Yes, it will take more time to fill rcahe->depot, but we don't need to fill
> all depots before using iova in them. We can use iova in rcache->depot
> when local cpu_rcache is empty and there are iova magazines in
> rcache->depot, which means "rcache->depot_size > 0". The larger depot
> size will help caching more iova magazines freed in __iova_rcache_insert()
> which means more potential memory cost for iova_rcache, and should not
> introduce performance issues.
Thanks,
John
Powered by blists - more mailing lists