[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120816172606.26743ozunoe6mbs4@www.81.fi>
Date: Thu, 16 Aug 2012 17:26:06 +0300
From: Jussi Kivilinna <jussi.kivilinna@...et.fi>
To: Borislav Petkov <bp@...en8.de>
Cc: Johannes Goetzfried
<Johannes.Goetzfried@...ormatik.stud.uni-erlangen.de>,
linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
Tilo Müller
<tilo.mueller@...ormatik.uni-erlangen.de>,
Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: [PATCH] crypto: twofish - add x86_64/avx assembler
implementation
Quoting Borislav Petkov <bp@...en8.de>:
> On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
>> About ~5% slower, probably because I was tuning for sandy-bridge and
>> introduced more FPU<=>CPU register moves.
>>
>> Here's new version of patch, with FPU<=>CPU moves from original
>> implementation.
>>
>> (Note: also changes encryption function to inline all code in to main
>> function, decryption still places common code to separate function to
>> reduce object size. This is to measure the difference.)
>
> Yep, looks better than the previous run and also a bit better or on par
> with the initial run I did.
Thanks again. Speed gained with patch is ~8%, and is able of getting
twofish-avx pass twofish-3way.
>
> The thing is, I'm not sure whether optimizing the thing for each uarch
> is a workable solution software-wise or maybe having a single version
> which performs sufficiently ok on all uarches is easier/better to
> maintain without causing code bloat. Hmmm...
Agreed, testing on multiple CPUs to get single well working version is
what I have done in the past. But purchasing all the latest CPUs on
the market isn't option for me, and for testing AVX I'm stuck with
sandy-bridge :)
-Jussi
> 4th:
> ====
> ran like 1st.
>
> [ 1014.074150]
> [ 1014.074150] testing speed of async ecb(twofish) encryption
> [ 1014.083829] test 0 (128 bit key, 16 byte blocks): 4870055
> operations in 1 seconds (77920880 bytes)
> [ 1015.092757] test 1 (128 bit key, 64 byte blocks): 2043828
> operations in 1 seconds (130804992 bytes)
> [ 1016.099441] test 2 (128 bit key, 256 byte blocks): 606400
> operations in 1 seconds (155238400 bytes)
> [ 1017.105939] test 3 (128 bit key, 1024 byte blocks): 168939
> operations in 1 seconds (172993536 bytes)
> [ 1018.112517] test 4 (128 bit key, 8192 byte blocks): 21777
> operations in 1 seconds (178397184 bytes)
> [ 1019.119035] test 5 (192 bit key, 16 byte blocks): 4882254
> operations in 1 seconds (78116064 bytes)
> [ 1020.125716] test 6 (192 bit key, 64 byte blocks): 2043230
> operations in 1 seconds (130766720 bytes)
> [ 1021.132391] test 7 (192 bit key, 256 byte blocks): 607477
> operations in 1 seconds (155514112 bytes)
> [ 1022.138889] test 8 (192 bit key, 1024 byte blocks): 168743
> operations in 1 seconds (172792832 bytes)
> [ 1023.145476] test 9 (192 bit key, 8192 byte blocks): 21442
> operations in 1 seconds (175652864 bytes)
> [ 1024.152012] test 10 (256 bit key, 16 byte blocks): 4891863
> operations in 1 seconds (78269808 bytes)
> [ 1025.158684] test 11 (256 bit key, 64 byte blocks): 2049390
> operations in 1 seconds (131160960 bytes)
> [ 1026.165366] test 12 (256 bit key, 256 byte blocks): 606847
> operations in 1 seconds (155352832 bytes)
> [ 1027.171841] test 13 (256 bit key, 1024 byte blocks): 169228
> operations in 1 seconds (173289472 bytes)
> [ 1028.178436] test 14 (256 bit key, 8192 byte blocks): 21773
> operations in 1 seconds (178364416 bytes)
> [ 1029.184981]
> [ 1029.184981] testing speed of async ecb(twofish) decryption
> [ 1029.194508] test 0 (128 bit key, 16 byte blocks): 4931065
> operations in 1 seconds (78897040 bytes)
> [ 1030.199640] test 1 (128 bit key, 64 byte blocks): 2056931
> operations in 1 seconds (131643584 bytes)
> [ 1031.206303] test 2 (128 bit key, 256 byte blocks): 589409
> operations in 1 seconds (150888704 bytes)
> [ 1032.212832] test 3 (128 bit key, 1024 byte blocks): 163681
> operations in 1 seconds (167609344 bytes)
> [ 1033.219443] test 4 (128 bit key, 8192 byte blocks): 21062
> operations in 1 seconds (172539904 bytes)
> [ 1034.225979] test 5 (192 bit key, 16 byte blocks): 4931537
> operations in 1 seconds (78904592 bytes)
> [ 1035.232608] test 6 (192 bit key, 64 byte blocks): 2053989
> operations in 1 seconds (131455296 bytes)
> [ 1036.239289] test 7 (192 bit key, 256 byte blocks): 589591
> operations in 1 seconds (150935296 bytes)
> [ 1037.241784] test 8 (192 bit key, 1024 byte blocks): 163565
> operations in 1 seconds (167490560 bytes)
> [ 1038.244387] test 9 (192 bit key, 8192 byte blocks): 20899
> operations in 1 seconds (171204608 bytes)
> [ 1039.250923] test 10 (256 bit key, 16 byte blocks): 4937343
> operations in 1 seconds (78997488 bytes)
> [ 1040.257589] test 11 (256 bit key, 64 byte blocks): 2050678
> operations in 1 seconds (131243392 bytes)
> [ 1041.264262] test 12 (256 bit key, 256 byte blocks): 586869
> operations in 1 seconds (150238464 bytes)
> [ 1042.270753] test 13 (256 bit key, 1024 byte blocks): 163548
> operations in 1 seconds (167473152 bytes)
> [ 1043.277365] test 14 (256 bit key, 8192 byte blocks): 21053
> operations in 1 seconds (172466176 bytes)
> [ 1044.283892]
> [ 1044.283892] testing speed of async cbc(twofish) encryption
> [ 1044.293349] test 0 (128 bit key, 16 byte blocks): 5186240
> operations in 1 seconds (82979840 bytes)
> [ 1045.298534] test 1 (128 bit key, 64 byte blocks): 1921034
> operations in 1 seconds (122946176 bytes)
> [ 1046.305207] test 2 (128 bit key, 256 byte blocks): 542787
> operations in 1 seconds (138953472 bytes)
> [ 1047.311699] test 3 (128 bit key, 1024 byte blocks): 141399
> operations in 1 seconds (144792576 bytes)
> [ 1048.318312] test 4 (128 bit key, 8192 byte blocks): 17755
> operations in 1 seconds (145448960 bytes)
> [ 1049.324829] test 5 (192 bit key, 16 byte blocks): 5196441
> operations in 1 seconds (83143056 bytes)
> [ 1050.331485] test 6 (192 bit key, 64 byte blocks): 1921456
> operations in 1 seconds (122973184 bytes)
> [ 1051.338157] test 7 (192 bit key, 256 byte blocks): 543581
> operations in 1 seconds (139156736 bytes)
> [ 1052.344658] test 8 (192 bit key, 1024 byte blocks): 141473
> operations in 1 seconds (144868352 bytes)
> [ 1053.351270] test 9 (192 bit key, 8192 byte blocks): 17601
> operations in 1 seconds (144187392 bytes)
> [ 1054.357823] test 10 (256 bit key, 16 byte blocks): 5190283
> operations in 1 seconds (83044528 bytes)
> [ 1055.364462] test 11 (256 bit key, 64 byte blocks): 1912796
> operations in 1 seconds (122418944 bytes)
> [ 1056.371134] test 12 (256 bit key, 256 byte blocks): 542719
> operations in 1 seconds (138936064 bytes)
> [ 1057.377643] test 13 (256 bit key, 1024 byte blocks): 141377
> operations in 1 seconds (144770048 bytes)
> [ 1058.384229] test 14 (256 bit key, 8192 byte blocks): 17752
> operations in 1 seconds (145424384 bytes)
> [ 1059.390799]
> [ 1059.390799] testing speed of async cbc(twofish) decryption
> [ 1059.400187] test 0 (128 bit key, 16 byte blocks): 4889197
> operations in 1 seconds (78227152 bytes)
> [ 1060.405460] test 1 (128 bit key, 64 byte blocks): 1980831
> operations in 1 seconds (126773184 bytes)
> [ 1061.408145] test 2 (128 bit key, 256 byte blocks): 568695
> operations in 1 seconds (145585920 bytes)
> [ 1062.410647] test 3 (128 bit key, 1024 byte blocks): 158294
> operations in 1 seconds (162093056 bytes)
> [ 1063.417258] test 4 (128 bit key, 8192 byte blocks): 20312
> operations in 1 seconds (166395904 bytes)
> [ 1064.423758] test 5 (192 bit key, 16 byte blocks): 4904906
> operations in 1 seconds (78478496 bytes)
> [ 1065.430440] test 6 (192 bit key, 64 byte blocks): 1983636
> operations in 1 seconds (126952704 bytes)
> [ 1066.437104] test 7 (192 bit key, 256 byte blocks): 564340
> operations in 1 seconds (144471040 bytes)
> [ 1067.443613] test 8 (192 bit key, 1024 byte blocks): 157404
> operations in 1 seconds (161181696 bytes)
> [ 1068.450216] test 9 (192 bit key, 8192 byte blocks): 20055
> operations in 1 seconds (164290560 bytes)
> [ 1069.456753] test 10 (256 bit key, 16 byte blocks): 4901215
> operations in 1 seconds (78419440 bytes)
> [ 1070.463417] test 11 (256 bit key, 64 byte blocks): 1978968
> operations in 1 seconds (126653952 bytes)
> [ 1071.470073] test 12 (256 bit key, 256 byte blocks): 568440
> operations in 1 seconds (145520640 bytes)
> [ 1072.476580] test 13 (256 bit key, 1024 byte blocks): 158329
> operations in 1 seconds (162128896 bytes)
> [ 1073.483177] test 14 (256 bit key, 8192 byte blocks): 20311
> operations in 1 seconds (166387712 bytes)
> [ 1074.489739]
> [ 1074.489739] testing speed of async ctr(twofish) encryption
> [ 1074.499266] test 0 (128 bit key, 16 byte blocks): 4565109
> operations in 1 seconds (73041744 bytes)
> [ 1075.504391] test 1 (128 bit key, 64 byte blocks): 1955085
> operations in 1 seconds (125125440 bytes)
> [ 1076.511055] test 2 (128 bit key, 256 byte blocks): 573971
> operations in 1 seconds (146936576 bytes)
> [ 1077.517563] test 3 (128 bit key, 1024 byte blocks): 158489
> operations in 1 seconds (162292736 bytes)
> [ 1078.524175] test 4 (128 bit key, 8192 byte blocks): 20330
> operations in 1 seconds (166543360 bytes)
> [ 1079.530702] test 5 (192 bit key, 16 byte blocks): 4550468
> operations in 1 seconds (72807488 bytes)
> [ 1080.537358] test 6 (192 bit key, 64 byte blocks): 1943897
> operations in 1 seconds (124409408 bytes)
> [ 1081.544030] test 7 (192 bit key, 256 byte blocks): 564033
> operations in 1 seconds (144392448 bytes)
> [ 1082.550531] test 8 (192 bit key, 1024 byte blocks): 157126
> operations in 1 seconds (160897024 bytes)
> [ 1083.557170] test 9 (192 bit key, 8192 byte blocks): 20121
> operations in 1 seconds (164831232 bytes)
> [ 1084.563713] test 10 (256 bit key, 16 byte blocks): 4403637
> operations in 1 seconds (70458192 bytes)
> [ 1085.570360] test 11 (256 bit key, 64 byte blocks): 1961264
> operations in 1 seconds (125520896 bytes)
> [ 1086.577008] test 12 (256 bit key, 256 byte blocks): 571514
> operations in 1 seconds (146307584 bytes)
> [ 1087.583517] test 13 (256 bit key, 1024 byte blocks): 158342
> operations in 1 seconds (162142208 bytes)
> [ 1088.590121] test 14 (256 bit key, 8192 byte blocks): 20392
> operations in 1 seconds (167051264 bytes)
> [ 1089.596648]
> [ 1089.596648] testing speed of async ctr(twofish) decryption
> [ 1089.606061] test 0 (128 bit key, 16 byte blocks): 4517104
> operations in 1 seconds (72273664 bytes)
> [ 1090.611326] test 1 (128 bit key, 64 byte blocks): 1953102
> operations in 1 seconds (124998528 bytes)
> [ 1091.617989] test 2 (128 bit key, 256 byte blocks): 574354
> operations in 1 seconds (147034624 bytes)
> [ 1092.624497] test 3 (128 bit key, 1024 byte blocks): 158402
> operations in 1 seconds (162203648 bytes)
> [ 1093.631110] test 4 (128 bit key, 8192 byte blocks): 20369
> operations in 1 seconds (166862848 bytes)
> [ 1094.637618] test 5 (192 bit key, 16 byte blocks): 4524710
> operations in 1 seconds (72395360 bytes)
> [ 1095.644293] test 6 (192 bit key, 64 byte blocks): 1940148
> operations in 1 seconds (124169472 bytes)
> [ 1096.650957] test 7 (192 bit key, 256 byte blocks): 567684
> operations in 1 seconds (145327104 bytes)
> [ 1097.657466] test 8 (192 bit key, 1024 byte blocks): 158922
> operations in 1 seconds (162736128 bytes)
> [ 1098.664088] test 9 (192 bit key, 8192 byte blocks): 20087
> operations in 1 seconds (164552704 bytes)
> [ 1099.670596] test 10 (256 bit key, 16 byte blocks): 4397085
> operations in 1 seconds (70353360 bytes)
> [ 1100.677278] test 11 (256 bit key, 64 byte blocks): 1961007
> operations in 1 seconds (125504448 bytes)
> [ 1101.683933] test 12 (256 bit key, 256 byte blocks): 577961
> operations in 1 seconds (147958016 bytes)
> [ 1102.690452] test 13 (256 bit key, 1024 byte blocks): 158836
> operations in 1 seconds (162648064 bytes)
> [ 1103.697038] test 14 (256 bit key, 8192 byte blocks): 20427
> operations in 1 seconds (167337984 bytes)
> [ 1104.703575]
> [ 1104.703575] testing speed of async lrw(twofish) encryption
> [ 1104.713108] test 0 (256 bit key, 16 byte blocks): 3555452
> operations in 1 seconds (56887232 bytes)
> [ 1105.718261] test 1 (256 bit key, 64 byte blocks): 1617632
> operations in 1 seconds (103528448 bytes)
> [ 1106.724924] test 2 (256 bit key, 256 byte blocks): 495199
> operations in 1 seconds (126770944 bytes)
> [ 1107.731442] test 3 (256 bit key, 1024 byte blocks): 137358
> operations in 1 seconds (140654592 bytes)
> [ 1108.738065] test 4 (256 bit key, 8192 byte blocks): 17637
> operations in 1 seconds (144482304 bytes)
> [ 1109.740593] test 5 (320 bit key, 16 byte blocks): 3478175
> operations in 1 seconds (55650800 bytes)
> [ 1110.743248] test 6 (320 bit key, 64 byte blocks): 1591957
> operations in 1 seconds (101885248 bytes)
> [ 1111.749911] test 7 (320 bit key, 256 byte blocks): 493803
> operations in 1 seconds (126413568 bytes)
> [ 1112.756430] test 8 (320 bit key, 1024 byte blocks): 137066
> operations in 1 seconds (140355584 bytes)
> [ 1113.763034] test 9 (320 bit key, 8192 byte blocks): 17288
> operations in 1 seconds (141623296 bytes)
> [ 1114.769587] test 10 (384 bit key, 16 byte blocks): 3576437
> operations in 1 seconds (57222992 bytes)
> [ 1115.776232] test 11 (384 bit key, 64 byte blocks): 1587771
> operations in 1 seconds (101617344 bytes)
> [ 1116.782890] test 12 (384 bit key, 256 byte blocks): 493841
> operations in 1 seconds (126423296 bytes)
> [ 1117.789396] test 13 (384 bit key, 1024 byte blocks): 137324
> operations in 1 seconds (140619776 bytes)
> [ 1118.795993] test 14 (384 bit key, 8192 byte blocks): 17625
> operations in 1 seconds (144384000 bytes)
> [ 1119.802548]
> [ 1119.802548] testing speed of async lrw(twofish) decryption
> [ 1119.811940] test 0 (256 bit key, 16 byte blocks): 3590161
> operations in 1 seconds (57442576 bytes)
> [ 1120.817198] test 1 (256 bit key, 64 byte blocks): 1623745
> operations in 1 seconds (103919680 bytes)
> [ 1121.823872] test 2 (256 bit key, 256 byte blocks): 482001
> operations in 1 seconds (123392256 bytes)
> [ 1122.830398] test 3 (256 bit key, 1024 byte blocks): 133842
> operations in 1 seconds (137054208 bytes)
> [ 1123.836992] test 4 (256 bit key, 8192 byte blocks): 17195
> operations in 1 seconds (140861440 bytes)
> [ 1124.843536] test 5 (320 bit key, 16 byte blocks): 3536998
> operations in 1 seconds (56591968 bytes)
> [ 1125.850156] test 6 (320 bit key, 64 byte blocks): 1625698
> operations in 1 seconds (104044672 bytes)
> [ 1126.856830] test 7 (320 bit key, 256 byte blocks): 482518
> operations in 1 seconds (123524608 bytes)
> [ 1127.863348] test 8 (320 bit key, 1024 byte blocks): 133672
> operations in 1 seconds (136880128 bytes)
> [ 1128.869959] test 9 (320 bit key, 8192 byte blocks): 16860
> operations in 1 seconds (138117120 bytes)
> [ 1129.876469] test 10 (384 bit key, 16 byte blocks): 3637750
> operations in 1 seconds (58204000 bytes)
> [ 1130.883151] test 11 (384 bit key, 64 byte blocks): 1626131
> operations in 1 seconds (104072384 bytes)
> [ 1131.889814] test 12 (384 bit key, 256 byte blocks): 483999
> operations in 1 seconds (123903744 bytes)
> [ 1132.896324] test 13 (384 bit key, 1024 byte blocks): 133598
> operations in 1 seconds (136804352 bytes)
> [ 1133.902920] test 14 (384 bit key, 8192 byte blocks): 17206
> operations in 1 seconds (140951552 bytes)
> [ 1134.905485]
> [ 1134.905485] testing speed of async xts(twofish) encryption
> [ 1134.905501] test 0 (256 bit key, 16 byte blocks): 2908165
> operations in 1 seconds (46530640 bytes)
> [ 1135.908137] test 1 (256 bit key, 64 byte blocks): 1462715
> operations in 1 seconds (93613760 bytes)
> [ 1136.914715] test 2 (256 bit key, 256 byte blocks): 506478
> operations in 1 seconds (129658368 bytes)
> [ 1137.921320] test 3 (256 bit key, 1024 byte blocks): 148018
> operations in 1 seconds (151570432 bytes)
> [ 1138.927924] test 4 (256 bit key, 8192 byte blocks): 19435
> operations in 1 seconds (159211520 bytes)
> [ 1139.934451] test 5 (384 bit key, 16 byte blocks): 2905195
> operations in 1 seconds (46483120 bytes)
> [ 1140.941116] test 6 (384 bit key, 64 byte blocks): 1454656
> operations in 1 seconds (93097984 bytes)
> [ 1141.947683] test 7 (384 bit key, 256 byte blocks): 504479
> operations in 1 seconds (129146624 bytes)
> [ 1142.954280] test 8 (384 bit key, 1024 byte blocks): 148172
> operations in 1 seconds (151728128 bytes)
> [ 1143.960892] test 9 (384 bit key, 8192 byte blocks): 19433
> operations in 1 seconds (159195136 bytes)
> [ 1144.967410] test 10 (512 bit key, 16 byte blocks): 2904583
> operations in 1 seconds (46473328 bytes)
> [ 1145.974091] test 11 (512 bit key, 64 byte blocks): 1501387
> operations in 1 seconds (96088768 bytes)
> [ 1146.980652] test 12 (512 bit key, 256 byte blocks): 504501
> operations in 1 seconds (129152256 bytes)
> [ 1147.987254] test 13 (512 bit key, 1024 byte blocks): 148180
> operations in 1 seconds (151736320 bytes)
> [ 1148.993842] test 14 (512 bit key, 8192 byte blocks): 19439
> operations in 1 seconds (159244288 bytes)
> [ 1150.000380]
> [ 1150.000380] testing speed of async xts(twofish) decryption
> [ 1150.009770] test 0 (256 bit key, 16 byte blocks): 3007004
> operations in 1 seconds (48112064 bytes)
> [ 1151.015056] test 1 (256 bit key, 64 byte blocks): 1534733
> operations in 1 seconds (98222912 bytes)
> [ 1152.021642] test 2 (256 bit key, 256 byte blocks): 508129
> operations in 1 seconds (130081024 bytes)
> [ 1153.028246] test 3 (256 bit key, 1024 byte blocks): 144920
> operations in 1 seconds (148398080 bytes)
> [ 1154.034859] test 4 (256 bit key, 8192 byte blocks): 18870
> operations in 1 seconds (154583040 bytes)
> [ 1155.041367] test 5 (384 bit key, 16 byte blocks): 3009083
> operations in 1 seconds (48145328 bytes)
> [ 1156.048040] test 6 (384 bit key, 64 byte blocks): 1535084
> operations in 1 seconds (98245376 bytes)
> [ 1157.054609] test 7 (384 bit key, 256 byte blocks): 508112
> operations in 1 seconds (130076672 bytes)
> [ 1158.061215] test 8 (384 bit key, 1024 byte blocks): 145035
> operations in 1 seconds (148515840 bytes)
> [ 1159.067830] test 9 (384 bit key, 8192 byte blocks): 18890
> operations in 1 seconds (154746880 bytes)
> [ 1160.070368] test 10 (512 bit key, 16 byte blocks): 3076988
> operations in 1 seconds (49231808 bytes)
> [ 1161.073040] test 11 (512 bit key, 64 byte blocks): 1540659
> operations in 1 seconds (98602176 bytes)
> [ 1162.079610] test 12 (512 bit key, 256 byte blocks): 508316
> operations in 1 seconds (130128896 bytes)
> [ 1163.086195] test 13 (512 bit key, 1024 byte blocks): 144951
> operations in 1 seconds (148429824 bytes)
> [ 1164.092792] test 14 (512 bit key, 8192 byte blocks): 18865
> operations in 1 seconds (154542080 bytes)
>
> --
> Regards/Gruss,
> Boris.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists