[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20036.35795.314019.270841@gargle.gargle.HOWL>
Date: Fri, 12 Aug 2011 10:11:31 +0800
From: Jason Wang <jasowang@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: mst@...hat.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, davem@...emloft.net,
krkumar2@...ibm.com, rusty@...tcorp.com.au, qemu-devel@...gnu.org,
kvm@...r.kernel.org, mirq-linux@...e.qmqm.pl
Subject: [net-next RFC PATCH 0/7] multiqueue support for tun/tap
Jason Wang writes:
> As multi-queue nics were commonly used for high-end servers,
> current single queue based tap can not satisfy the
> requirement of scaling guest network performance as the
> numbers of vcpus increase. So the following series
> implements multiple queue support in tun/tap.
>
> In order to take advantages of this, a multi-queue capable
> driver and qemu were also needed. I just rebase the latest
> version of Krishna's multi-queue virtio-net driver into this
> series to simplify the test. And for multiqueue supported
> qemu, you can refer the patches I post in
> http://www.spinics.net/lists/kvm/msg52808.html. Vhost is
> also a must to achieve high performance and its code could
> be used for multi-queue without modification. Alternatively,
> this series can be also used for Krishna's M:N
> implementation of multiqueue but I didn't test it.
>
> The idea is simple: each socket were abstracted as a queue
> for tun/tap, and userspace may open as many files as
> required and then attach them to the devices. In order to
> keep the ABI compatibility, device creation were still
> finished in TUNSETIFF, and two new ioctls TUNATTACHQUEUE and
> TUNDETACHQUEUE were added for user to manipulate the numbers
> of queues for the tun/tap.
>
> I've done some basic performance testing of multi queue
> tap. For tun, I just test it through vpnc.
>
> Notes:
> - Test shows improvement when receving packets from
> local/external host to guest, and send big packet from guest
> to local/external host.
> - Current multiqueue based virtio-net/tap introduce a
> regression of send small packet (512 byte) from guest to
> local/external host. I suspect it's the issue of queue
> selection in both guest driver and tap. Would continue to
> investigate.
> - I would post the perforamnce numbers as a reply of this
> mail.
>
> TODO:
> - solve the issue of packet transmission of small packets.
> - addressing the comments of virtio-net driver
> - performance tunning
>
> Please review and comment it, Thanks.
>
> ---
>
> Jason Wang (5):
> tuntap: move socket/sock related structures to tun_file
> tuntap: categorize ioctl
> tuntap: introduce multiqueue related flags
> tuntap: multiqueue support
> tuntap: add ioctls to attach or detach a file form tap device
>
> Krishna Kumar (2):
> Change virtqueue structure
> virtio-net changes
>
>
> drivers/net/tun.c | 738 ++++++++++++++++++++++++++-----------------
> drivers/net/virtio_net.c | 578 ++++++++++++++++++++++++----------
> drivers/virtio/virtio_pci.c | 10 -
> include/linux/if_tun.h | 5
> include/linux/virtio.h | 1
> include/linux/virtio_net.h | 3
> 6 files changed, 867 insertions(+), 468 deletions(-)
>
> --
> Jason Wang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Here are some performance result for multiqueue tap
For multiqueue, the test use qemu-kvm + mq patches, net-next-2.6+
tap mq patches + mq driver,
For single queue, the test use qemu-kvm, net-next-2.6, rfs
were also enabled in the guest during the test.
All test were done by netperf in two i7(Intel(R) Xeon(R) CPU
E5620 2.40GHz) with direct connected 82599 cards.
Quick Notes to the result:
- Regression with Guest to External/Local host of 512 bytes.
- For the External host to guest, could scale or at least
the same as the single queue implementation.
1 Guest to External Host TCP 512 byte
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 2054.11 23.43 87
2 2037.32 22.64 89
4 2007.53 22.87 87
8 1993.41 23.82 83
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 1960.58 24.30 80
2 9250.41 32.19 287
4 3897.49 49.31 79
8 4088.44 46.85 87
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 1986.87 23.17 85
2 4431.79 44.64 99
4 8705.83 51.89 167
8 9420.63 45.96 204
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 1820.38 20.17 90
2 3707.64 42.19 87
4 8930.71 63.65 140
8 9391.13 51.90 180
Single-queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 2032.64 22.96 88
2 2058.76 23.22 88
4 2028.97 22.84 88
8 1989.41 23.89 83
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 2444.50 25.00 97
2 9298.64 30.76 302
4 8788.58 30.82 285
8 9158.28 30.45 300
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 2359.50 25.10 94
2 9325.88 29.83 312
4 9198.29 32.96 279
8 8980.73 32.25 278
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 2170.15 23.77 91
2 8329.73 28.79 289
4 8152.25 36.11 225
8 9121.11 40.08 227
2 Guest to external host TCP with default size
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 7767.87 18.43 421
2 9399.18 21.48 437
4 8373.23 21.37 391
8 9310.84 21.91 424
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 9358.75 20.27 461
2 9405.25 30.67 306
4 9407.63 26.24 358
8 9412.77 28.75 327
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 9358.39 22.11 423
2 9401.27 27.29 344
4 9414.98 28.75 327
8 9420.93 31.09 303
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 9057.52 20.09 450
2 8486.72 28.18 301
4 9330.96 40.13 232
8 9377.99 59.41 157
Single Queue Result
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 8192.58 19.30 424
2 9400.31 22.55 416
4 8771.94 21.75 403
8 8922.61 22.50 396
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 9387.28 23.13 405
2 8322.94 24.58 338
4 9404.86 26.22 358
8 9145.79 26.57 344
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 2377.83 9.86 241
2 9403.32 26.96 348
4 8822.57 27.23 324
8 9380.85 26.90 348
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 7275.95 21.47 338
2 9407.34 27.39 343
4 8365.05 25.99 321
8 9150.65 27.78 329
3 External Host to guest TCP, default packet size
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 8944.69 25.59 349
2 8503.67 24.95 340
4 7910.54 25.88 305
8 7455.13 26.35 282
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 9370.11 23.70 395
2 9365.97 31.91 293
4 9389.83 34.99 268
8 9405.52 34.83 270
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 9061.71 23.45 386
2 9373.92 22.38 418
4 9399.83 40.89 229
8 9412.92 48.99 192
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 8203.61 24.64 332
2 9286.28 32.68 284
4 9403.61 49.33 190
8 9411.42 64.38 146
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 8999.39 26.24 342
2 8921.23 25.00 356
4 7918.52 26.60 297
8 6901.77 25.92 266
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 9016.77 25.82 349
2 8572.92 33.19 258
4 7962.34 28.88 275
8 6959.10 32.77 212
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 8951.43 25.76 347
2 8411.78 35.51 236
4 7874.05 35.99 218
8 6869.55 36.80 186
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 9332.84 25.95 359
2 9103.57 30.37 299
4 7907.03 33.94 232
8 6919.99 38.82 178
4 External Host to guest TCP with 512 byte packet size
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 3354.22 15.75 212
2 6419.73 22.59 284
4 7545.04 25.06 301
8 7550.39 26.32 286
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 3146.17 14.08 223
2 6414.55 21.01 305
4 9389.08 37.86 247
8 9402.39 40.24 233
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 3247.65 14.91 217
2 6528.78 29.89 218
4 9402.89 37.79 248
8 9404.06 47.87 196
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 4367.90 14.16 308
2 6962.76 27.99 248
4 9404.83 41.26 227
8 9412.09 57.74 163
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 3253.88 14.53 223
2 6385.90 20.83 306
4 7581.40 26.07 290
8 7025.62 26.54 264
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 3257.61 13.85 235
2 6385.06 20.66 309
4 7465.50 32.27 231
8 7021.31 31.42 223
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 3186.60 15.88 200
2 6298.92 27.40 229
4 7474.69 32.53 229
8 6985.72 33.36 209
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 3279.81 17.63 186
2 6513.77 29.78 218
4 7413.30 35.44 209
8 6936.96 32.68 212
5 Guest to Local host TCP with 512 byte packet size
Multuqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 1961.31 35.43 55
2 1974.04 34.76 56
4 1906.74 34.04 56
8 1907.94 34.75 54
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 1971.22 31.95 61
2 2484.96 58.75 42
4 3290.77 53.18 61
8 3031.99 54.11 56
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 1107.56 31.22 35
2 2811.83 59.57 47
4 10276.05 79.79 128
8 12760.93 96.93 131
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 1888.28 32.15 58
2 2335.03 56.72 41
4 9785.72 82.22 119
8 11274.42 95.60 117
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 1981.08 31.89 62
2 1970.74 32.57 60
4 1944.63 32.02 60
8 1943.50 31.45 61
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 2118.23 34.80 60
2 7221.95 45.63 158
4 7924.92 47.06 168
8 8651.28 47.40 182
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 2110.70 33.18 63
2 6602.25 42.86 154
4 9715.38 47.38 205
8 20131.98 61.94 325
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 1881.33 40.69 46
2 7631.25 48.56 157
4 13366.28 59.47 224
8 19949.45 68.85 289
6 Guest to Local host with default packet size.
Multuqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 8674.81 34.86 248
2 8576.14 34.72 247
4 8503.87 34.62 245
8 8247.43 33.77 244
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 7785.02 32.25 241
2 14696.71 58.14 252
4 12339.64 51.43 239
8 12997.55 52.53 247
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 8557.25 32.38 264
2 12164.88 58.56 207
4 18144.19 73.69 246
8 29756.33 96.15 309
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 6808.67 36.55 186
2 11590.04 61.14 189
4 23667.67 81.50 290
8 25501.89 92.44 275
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 8053.49 36.35 221
2 8493.95 35.21 241
4 8367.26 34.61 241
8 8435.64 35.45 237
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 9259.56 35.24 262
2 17153.83 44.07 389
4 16901.67 45.88 368
8 18180.81 42.34 429
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 8928.11 31.22 285
2 16835.27 47.79 352
4 16923.83 47.78 354
8 18050.62 45.86 393
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 2978.88 25.75 115
2 15422.18 41.97 367
4 16137.10 45.90 351
8 16628.30 48.99 339
7 Local host to Guest with defaut 512 packet size
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 3665.90 31.88 114
2 5709.15 38.16 149
4 8803.25 42.92 205
8 10530.33 45.21 232
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 3390.07 31.28 108
2 7502.21 62.42 120
4 14247.63 67.23 211
8 16766.93 69.66 240
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 3580.96 31.90 112
2 4353.46 62.85 69
4 8264.18 77.94 106
8 16014.00 80.11 199
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 1745.36 41.84 41
2 4472.03 73.50 60
4 12646.92 79.86 158
8 18212.21 89.79 202
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 4220.96 31.88 132
2 5732.38 37.12 154
4 7006.81 41.60 168
8 10529.09 45.92 229
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 2665.41 40.53 65
2 9864.49 59.44 165
4 11678.42 60.20 193
8 16042.60 57.85 277
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 2609.10 42.67 61
2 5496.83 68.52 80
4 16848.24 60.49 278
8 14829.66 60.54 244
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 2567.15 44.54 57
2 5902.02 59.32 99
4 13265.99 68.48 193
8 15301.16 63.95 239
8 Local host to Guest with default packet size
Multiqueue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 12531.65 29.95 418
2 12495.93 30.05 415
4 12487.40 31.28 399
8 11501.68 33.51 343
== smp=2 queue=2 ==
sessions | throughput | cpu | normalized
1 12566.08 28.86 435
2 21756.15 54.33 400
4 19899.84 56.37 353
8 19326.62 61.57 313
== smp=4 queue=4 ==
sessions | throughput | cpu | normalized
1 12383.42 28.69 431
2 19714.34 57.62 342
4 20609.45 64.13 321
8 18935.57 95.05 199
== smp=8 queue=8 ==
sessions | throughput | cpu | normalized
1 13736.90 31.95 429
2 26157.13 71.77 364
4 22874.41 78.54 291
8 19960.91 96.08 207
Single Queue Result:
== smp=1 queue=1 ==
sessions | throughput | cpu | normalized
1 12501.11 30.01 416
2 12497.01 28.51 438
4 12429.25 31.09 399
8 12152.53 28.20 430
== smp=2 queue=1 ==
sessions | throughput | cpu | normalized
1 13632.87 35.32 385
2 19900.82 46.28 430
4 17510.87 42.21 414
8 14443.78 35.48 407
== smp=4 queue=1 ==
sessions | throughput | cpu | normalized
1 14584.61 37.70 386
2 12646.50 31.39 402
4 16248.16 49.22 330
8 14131.34 47.48 297
== smp=8 queue=1 ==
sessions | throughput | cpu | normalized
1 16279.89 39.51 412
2 16958.02 53.87 314
4 16906.03 50.35 335
8 14686.25 47.30 310
--
Jason Wang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists