netdev - Re: [PATCH RFC v8 02/11] vhost: use batched get_vq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200720051410-mutt-send-email-mst@kernel.org>
Date:   Mon, 20 Jul 2020 05:27:39 -0400
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Eugenio Perez Martin <eperezma@...hat.com>
Cc:     Jason Wang <jasowang@...hat.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        linux-kernel@...r.kernel.org, kvm list <kvm@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

On Thu, Jul 16, 2020 at 07:16:27PM +0200, Eugenio Perez Martin wrote:
> On Fri, Jul 10, 2020 at 7:58 AM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Fri, Jul 10, 2020 at 07:39:26AM +0200, Eugenio Perez Martin wrote:
> > > > > How about playing with the batch size? Make it a mod parameter instead
> > > > > of the hard coded 64, and measure for all values 1 to 64 ...
> > > >
> > > >
> > > > Right, according to the test result, 64 seems to be too aggressive in
> > > > the case of TX.
> > > >
> > >
> > > Got it, thanks both!
> >
> > In particular I wonder whether with batch size 1
> > we get same performance as without batching
> > (would indicate 64 is too aggressive)
> > or not (would indicate one of the code changes
> > affects performance in an unexpected way).
> >
> > --
> > MST
> >
> 
> Hi!
> 
> Varying batch_size as drivers/vhost/net.c:VHOST_NET_BATCH,

sorry this is not what I meant.

I mean something like this:


diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0b509be8d7b1..b94680e5721d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1279,6 +1279,10 @@ static void handle_rx_net(struct vhost_work *work)
 	handle_rx(net);
 }
 
+MODULE_PARM_DESC(batch_num, "Number of batched descriptors. (offset from 64)");
+module_param(batch_num, int, 0644);
+static int batch_num = 0;
+
 static int vhost_net_open(struct inode *inode, struct file *f)
 {
 	struct vhost_net *n;
@@ -1333,7 +1337,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
 	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
-		       UIO_MAXIOV + VHOST_NET_BATCH,
+		       UIO_MAXIOV + VHOST_NET_BATCH + batch_num,
 		       VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT, true,
 		       NULL);
 

then you can try tweaking batching and playing with mod parameter without
recompiling.


VHOST_NET_BATCH affects lots of other things.


> and testing
> the pps as previous mail says. This means that we have either only
> vhost_net batching (in base testing, like previously to apply this
> patch) or both batching sizes the same.
> 
> I've checked that vhost process (and pktgen) goes 100% cpu also.
> 
> For tx: Batching decrements always the performance, in all cases. Not
> sure why bufapi made things better the last time.
> 
> Batching makes improvements until 64 bufs, I see increments of pps but like 1%.
> 
> For rx: Batching always improves performance. It seems that if we
> batch little, bufapi decreases performance, but beyond 64, bufapi is
> much better. The bufapi version keeps improving until I set a batching
> of 1024. So I guess it is super good to have a bunch of buffers to
> receive.
> 
> Since with this test I cannot disable event_idx or things like that,
> what would be the next step for testing?
> 
> Thanks!
> 
> --
> Results:
> # Buf size: 1,16,32,64,128,256,512
> 
> # Tx
> # ===
> # Base
> 2293304.308,3396057.769,3540860.615,3636056.077,3332950.846,3694276.154,3689820
> # Batch
> 2286723.857,3307191.643,3400346.571,3452527.786,3460766.857,3431042.5,3440722.286
> # Batch + Bufapi
> 2257970.769,3151268.385,3260150.538,3379383.846,3424028.846,3433384.308,3385635.231,3406554.538
> 
> # Rx
> # ==
> # pktgen results (pps)
> 1223275,1668868,1728794,1769261,1808574,1837252,1846436
> 1456924,1797901,1831234,1868746,1877508,1931598,1936402
> 1368923,1719716,1794373,1865170,1884803,1916021,1975160
> 
> # Testpmd pps results
> 1222698.143,1670604,1731040.6,1769218,1811206,1839308.75,1848478.75
> 1450140.5,1799985.75,1834089.75,1871290,1880005.5,1934147.25,1939034
> 1370621,1721858,1796287.75,1866618.5,1885466.5,1918670.75,1976173.5,1988760.75,1978316
> 
> pktgen was run again for rx with 1024 and 2048 buf size, giving
> 1988760.75 and 1978316 pps. Testpmd goes the same way.

Don't really understand what does this data mean.
Which number of descs is batched for each run?

-- 
MST