lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <86eaa286-d884-d599-4109-bbf31b5d5d97@huawei.com>
Date:   Fri, 23 Nov 2018 16:40:58 +0000
From:   John Garry <john.garry@...wei.com>
To:     Will Deacon <will.deacon@....com>
CC:     <joro@...tes.org>, <robin.murphy@....com>,
        <linux-kernel@...r.kernel.org>, <iommu@...ts.linux-foundation.org>,
        <ganapatrao.kulkarni@...ium.com>, <hch@....de>,
        <m.szyprowski@...sung.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH v3] iommu/dma: Use NUMA aware memory allocations in
 __iommu_dma_alloc_pages()

On 21/11/2018 16:57, Will Deacon wrote:
> On Wed, Nov 21, 2018 at 04:47:48PM +0000, John Garry wrote:
>> On 21/11/2018 16:07, Will Deacon wrote:
>>> On Wed, Nov 21, 2018 at 10:54:10PM +0800, John Garry wrote:
>>>> From: Ganapatrao Kulkarni <ganapatrao.kulkarni@...ium.com>
>>>>
>>>> Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
>>>> respective device NUMA node. The ternary operator which would be for
>>>> alloc_pages_node() is tidied along with this.
>>>>
>>>> We also include a change to use kvzalloc() for kzalloc()/vzalloc()
>>>> combination.
>>>>
>>>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@...ium.com>
>>>> [JPG: Added kvzalloc(), drop pages ** being device local, tidied ternary operator]
>>>> Signed-off-by: John Garry <john.garry@...wei.com>
>>>
>>> Weird, you're missing a diffstat here.
>>>
>>> Anyway, the patch looks fine to me, but it would be nice if you could
>>> justify the change with some numbers. Do you actually see an improvement
>> >from this change?
>>>
>>
>> Hi Will,
>>
>> Ah, I missed adding my comments explaining the motivation. It would be
>> better in the commit log. Anyway, here's the snippet:
>>
>> " ... as mentioned in [3], dma_alloc_coherent() uses the locality
>> information from the device - as in direct DMA - so this patch is just
>> applying this same policy.
>>
>> [3]
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html"
>
> Yes, please add to this to the commit log.
>

Sure,

>> I did have some numbers to show improvement in some scenarios when I tested
>> this a while back which I'll dig out.
>>
>> However I would say that some scenarios will improve and the opposite for
>> others with this change, considering different conditions in which DMA
>> memory may be used.
>
> Well, if you can show that it's useful in some cases and not catastrophic in
> others, then I think shooting for parity with direct DMA is a reasonable
> justification for the change.
>

So I have done some more testing with our SoC crypto engine, using 
tcrypt module. The reason I used this device was because we can utilise 
a local device per socket in the system, unlike other DMAing devices, 
which generally only exist on a single socket (for us, anyway).

Overall the results aren't brilliant - as expected - but show a general 
marginal improvement. Here's some figures:

Summary:
Average diff		+0.9%
Max diff		+1.5%
Min diff		+0.2%

Test			     Ops/second	before  after   diff %	

async ecb(aes) encrypt
test 0 (128 bit key, 16 byte blocks)	68717	69057	0.5
test 1 (128 bit key, 64 byte blocks):	72633	73163	0.7
test 2 (128 bit key, 256 byte blocks):	71475	72076	0.8
test 3 (128 bit key, 1024 byte blocks):	66819	67467	1.0
test 4 (128 bit key, 8192 byte blocks):	38237	38495	0.7
test 5 (192 bit key, 16 byte blocks): 	70273	71079	1.2
test 6 (192 bit key, 64 byte blocks):	72455	73292	1.2
test 7 (192 bit key, 256 byte blocks):	71085	71876	1.1
test 8 (192 bit key, 1024 byte blocks):	65891	66576	1.0
test 9 (192 bit key, 8192 byte blocks): 34846	35061	0.6
test 10 (256 bit key, 16 byte blocks):	72927	73762	1.2
test 11 (256 bit key, 64 byte blocks):	72361	73207	1.2
test 12 (256 bit key, 256 byte blocks):	70907	71602	1.0
test 13 (256 bit key, 1024 byte blocks):65035	65653	1.0
test 14 (256 bit key, 8192 byte blocks):32835	32998	0.5
async ecb(aes) decrypt
test 0 (128 bit key, 16 byte blocks)	68384	69130	1.1
test 1 (128 bit key, 64 byte blocks):	72645	73133	0.7
test 2 (128 bit key, 256 byte blocks):	71523	71912	0.5
test 3 (128 bit key, 1024 byte blocks):	66902	67258	0.5
test 4 (128 bit key, 8192 byte blocks):	38260	38434	0.5
test 5 (192 bit key, 16 byte blocks): 	70301	70816	0.7
test 6 (192 bit key, 64 byte blocks):	72473	73064	0.8
test 7 (192 bit key, 256 byte blocks):	71106	71663	0.8
test 8 (192 bit key, 1024 byte blocks):	65915	66363	0.7
test 9 (192 bit key, 8192 byte blocks): 34876	35006	0.4
test 10 (256 bit key, 16 byte blocks):	72969	73519	0.8
test 11 (256 bit key, 64 byte blocks):	72404	72925	0.7
test 12 (256 bit key, 256 byte blocks):	70861	71350	0.7
test 13 (256 bit key, 1024 byte blocks):65074	65485	0.6
test 14 (256 bit key, 8192 byte blocks):32861	32974	0.3
async cbc(aes) encrypt
test 0 (128 bit key, 16 byte blocks)	58306	59131	1.4
test 1 (128 bit key, 64 byte blocks):	61647	62565	1.5
test 2 (128 bit key, 256 byte blocks):	60841	61666	1.4
test 3 (128 bit key, 1024 byte blocks):	57503	58204	1.2
test 4 (128 bit key, 8192 byte blocks):	34760	35055	0.9
test 5 (192 bit key, 16 byte blocks): 	59684	60452	1.3
test 6 (192 bit key, 64 byte blocks):	61705	62516	1.3
test 7 (192 bit key, 256 byte blocks):	60678	61426	1.2
test 8 (192 bit key, 1024 byte blocks):	56805	57487	1.2
test 9 (192 bit key, 8192 byte blocks): 31836	32093	0.8
test 10 (256 bit key, 16 byte blocks):	61961	62785	1.3
test 11 (256 bit key, 64 byte blocks):	61584	62427	1.4
test 12 (256 bit key, 256 byte blocks):	60407	61246	1.4
test 13 (256 bit key, 1024 byte blocks):56135	56868	1.3
test 14 (256 bit key, 8192 byte blocks):30128	30380	0.8
async cbc(aes) decrypt
test 0 (128 bit key, 16 byte blocks)	58555	59044	0.8
test 1 (128 bit key, 64 byte blocks):	61853	62589	1.2
test 2 (128 bit key, 256 byte blocks):	60992	61728	1.2
test 3 (128 bit key, 1024 byte blocks):	57591	58250	1.1
test 4 (128 bit key, 8192 byte blocks):	34796	35064	0.8
test 5 (192 bit key, 16 byte blocks): 	59843	60506	1.1
test 6 (192 bit key, 64 byte blocks):	61808	62521	1.2
test 7 (192 bit key, 256 byte blocks):	60800	61445	1.1
test 8 (192 bit key, 1024 byte blocks):	56949	57513	1.0
test 9 (192 bit key, 8192 byte blocks): 31890	32107	0.7
test 10 (256 bit key, 16 byte blocks):	62109	62778	1.1
test 11 (256 bit key, 64 byte blocks):	61748	62418	1.1
test 12 (256 bit key, 256 byte blocks):	60604	61226	1.0
test 13 (256 bit key, 1024 byte blocks):56277	56845	1.0
test 14 (256 bit key, 8192 byte blocks):30165	30340	0.6
sync xts(aes) encrypt
test 0 (256 bit key, 16 byte blocks):	59501	59742	0.4
test 1 (256 bit key, 64 byte blocks):	62894	63249	0.6
test 2 (256 bit key, 256 byte blocks):	62074	62476	0.7
test 3 (256 bit key, 1024 byte blocks):	58569	58894	0.6
test 4 (256 bit key, 8192 byte blocks):	35271	35462	0.5
test 5 (512 bit key, 16 byte blocks):	60749	61174	0.7
test 6 (512 bit key, 64 byte blocks):	62760	63148	0.6
test 7 (512 bit key, 256 byte blocks):	61625	61928	0.5
test 8 (512 bit key, 1024 byte blocks):	57162	57452	0.5
test 9 (512 bit key, 8192 byte blocks):	30582	30718	0.4
async xts(aes) decrypt
test 0 (256 bit key, 16 byte blocks):	59423	59587	0.3
test 1 (256 bit key, 64 byte blocks):	62849	62974	0.2
test 2 (256 bit key, 256 byte blocks):	61884	62231	0.6
test 3 (256 bit key, 1024 byte blocks):	58402	58699	0.5
test 4 (256 bit key, 8192 byte blocks):	35244	35353	0.3
test 5 (512 bit key, 16 byte blocks):	60693	60882	0.3
test 6 (512 bit key, 64 byte blocks):	62671	62896	0.4
test 7 (512 bit key, 256 byte blocks):	61488	61722	0.4
test 8 (512 bit key, 1024 byte blocks):	57092	57261	0.3
test 9 (512 bit key, 8192 byte blocks):	30572	30655	0.3
sync ctr(aes) encrypt	
test 0 (128 bit key, 16 byte blocks):	59673	9926	0.4
test 1 (128 bit key, 64 byte blocks):	62738	63024	0.5
test 2 (128 bit key, 256 byte blocks):	61841	62171	0.5
test 3 (128 bit key, 1024 byte blocks):	58374	58633	0.4
test 4 (128 bit key, 8192 byte blocks):	35137	35273	0.4
test 5 (192 bit key, 16 byte blocks):	60736	61049	0.5
test 6 (192 bit key, 64 byte blocks):	62718	63081	0.6
test 7 (192 bit key, 256 byte blocks):	61652	61973	0.5
test 8 (192 bit key, 1024 byte blocks):	57709	58002	0.5
test 9 (192 bit key, 8192 byte blocks):	32171	32306	0.4
test 10 (256 bit key, 16 byte blocks):	63000	63428	0.7
test 11 (256 bit key, 64 byte blocks):	62594	63027	0.7
test 12 (256 bit key, 256 byte blocks):	61452	61845	0.6
test 13 (256 bit key, 1024 byte blocks):57019	57346	0.6
test 14 (256 bit key, 8192 byte blocks):30422	30544	0.4
async ctr(aes) decrypt
test 0 (128 bit key, 16 byte blocks):	59488	59994	0.9
test 1 (128 bit key, 64 byte blocks):	62651	63173	0.8
test 2 (128 bit key, 256 byte blocks):	61778	62464	1.1
test 3 (128 bit key, 1024 byte blocks):	58290	58909	1.1
test 4 (128 bit key, 8192 byte blocks):	35115	35358	0.7
test 5 (192 bit key, 16 byte blocks):	60590	61244	1.1
test 6 (192 bit key, 64 byte blocks):	62604	63296	1.1
test 7 (192 bit key, 256 byte blocks):	61540	62199	1.1
test 8 (192 bit key, 1024 byte blocks):	57577	58149	1.0
test 9 (192 bit key, 8192 byte blocks):	32146	32368	0.7
test 10 (256 bit key, 16 byte blocks):	62852	63610	1.2
test 11 (256 bit key, 64 byte blocks):	62523	63276	1.2
test 12 (256 bit key, 256 byte blocks):	61356	62049	1.1
test 13 (256 bit key, 1024 byte blocks):56943	57546	1.1
test 14 (256 bit key, 8192 byte blocks):30399	30593	0.6

Let me know if any more questions or comments.

Thanks,
John

> Will
>
> .
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ