netdev - Re: [RFC PATCH] Regression in linux 2.6.32 virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 17 Dec 2009 19:40:32 +0530
From:	Krishna Kumar2 <krkumar2@...ibm.com>
To:	Jarek Poplawski <jarkao2@...il.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>, mst@...hat.com,
	netdev@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>,
	Sridhar Samudrala <sri@...ibm.com>
Subject: Re: [RFC PATCH] Regression in linux 2.6.32 virtio_net seen with	vhost-net

Jarek Poplawski <jarkao2@...il.com> wrote on 12/17/2009 06:47:09 PM:

> On Thu, Dec 17, 2009 at 05:26:37PM +0530, Krishna Kumar2 wrote:
> > Sridhar is seeing 280K requeue's, and that probably implies device
> > was stopped and wrongly restarted immediately. So the next xmit in
> > the kernel found the txq is not stopped and called the xmit handler,
> > get a BUSY, requeue, and so on. That would also explain why his BW
> > drops so much - all false starts (besides 19% of all skbs being
> > requeued). I assume that each time when we check:
> >
> >       if (!netif_tx_queue_stopped(txq) && !netif_tx_queue_frozen(txq))
> >             ret = dev_hard_start_xmit(skb, dev, txq);
> > it passes the check and dev_hard_start_xmit is called wrongly.
> >
> > #Requeues: 283575
> > #total skbs: 1469482
> > Percentage requeued: 19.29%
>
> I haven't followed this thread, so I'm not sure what are you looking
> for, but can't these requeues/drops mean some hardware limits were
> reached? I wonder why there are compared linux-2.6.32 vs. 2.6.31.6
> with different test conditions (avg. packet sizes: 16800 vs. 64400)?

Hi Jarek,

That is a good point. I am not sure why the avg packet sizes are
so different in the bstats. Did GSO change in these two versions?

I took the numbers from Sridhar's mail before the NAPI patch. I think
having 280K requeue's in 1 min means that the driver is waking up the
queue when it should not. The NAPI patch fixes that, but he still
reported seeing requeue's.

thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html