netdev - Re: [RFC PATCH] Regression in linux 2.6.32 virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <OFA4DBC95B.29C84EEB-ON6525768F.0035A704-6525768F.003644A9@in.ibm.com>
Date:	Thu, 17 Dec 2009 15:33:57 +0530
From:	Krishna Kumar2 <krkumar2@...ibm.com>
To:	Sridhar Samudrala <sri@...ibm.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>, mst@...hat.com,
	netdev@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [RFC PATCH] Regression in linux 2.6.32 virtio_net seen with vhost-net

> Sridhar Samudrala <sri@...ibm.com>
>
> Re: [RFC PATCH] Regression in linux 2.6.32 virtio_net seen with vhost-net
>
> Herbert Xu wrote:
> > On Wed, Dec 16, 2009 at 09:05:32PM -0800, Sridhar Samudrala wrote:
> >
> >> I think sch_direct_xmit() is not even calling dev_hard_start_xmit() as

> >> the tx queue is stopped
> >> and does a dev_requeue_skb() and returns NETDEV_TX_BUSY.
> >>
> >
> > Yes but if the queue was stopped then we shouldn't even get into
> > sch_direct_xmit.
> I don't see any checks for txq_stopped in the callers of
sch_direct_xmit() :
> __dev_xmit_skb() and qdisc_restart().  Both these routines get the txq
> and call
> sch_direct_xmit() which checks if tx queue is stopped or frozen.
>
> Am i missing something?

Yes - dequeue_skb.

The final skb, before the queue was stopped, is transmitted by
the driver. The next time sch_direct_xmit is called, it gets a
skb and finds the device is stopped and requeue's the skb. For
all subsequent xmits, dequeue_skb returns NULL (and the other
caller - __dev_xmit_skb can never be called since qdisc_qlen is
true) and thus requeue's will not happen. This also means that
the number of requeues you see (eg 283K in one run) is the number
of times the queue was stopped and restarted. So it looks like
driver either:

1. didn't stop the queue when xmiting a packet successfully (the
      condition being that it would not be possible to xmit the
      next skb). But this doesn't seem to be the case.
2. wrongly restarted the queue. Possible - since a few places
      use both the start & wake queue api's.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html