netdev - Re: [RFC PATCH] Regression in linux 2.6.32 virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <OF32B8811E.870B9515-ON6525768F.003F57BD-6525768F.0040952B@in.ibm.com>
Date:	Thu, 17 Dec 2009 17:26:37 +0530
From:	Krishna Kumar2 <krkumar2@...ibm.com>
To:	Jarek Poplawski <jarkao2@...il.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>, mst@...hat.com,
	netdev@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>,
	Sridhar Samudrala <sri@...ibm.com>
Subject: Re: [RFC PATCH] Regression in linux 2.6.32 virtio_net seen with	vhost-net

> Jarek Poplawski <jarkao2@...il.com>
>
> >>> On Wed, Dec 16, 2009 at 09:05:32PM -0800, Sridhar Samudrala wrote:
> >>>
> >>>> I think sch_direct_xmit() is not even calling dev_hard_start_xmit()
as
> >
> >>>> the tx queue is stopped
> >>>> and does a dev_requeue_skb() and returns NETDEV_TX_BUSY.
> >>>>
> >>> Yes but if the queue was stopped then we shouldn't even get into
> >>> sch_direct_xmit.
> >> I don't see any checks for txq_stopped in the callers of
> > sch_direct_xmit() :
> >> __dev_xmit_skb() and qdisc_restart().  Both these routines get the txq
> >> and call
> >> sch_direct_xmit() which checks if tx queue is stopped or frozen.
> >>
> >> Am i missing something?
> >
> > Yes - dequeue_skb.
> >
> > The final skb, before the queue was stopped, is transmitted by
> > the driver. The next time sch_direct_xmit is called, it gets a
> > skb and finds the device is stopped and requeue's the skb.
>
> So we _should_ get into sch_direct_xmit when the queue was stopped...
> I guess Herbert might forget the multiqueue change, and Sridhar isn't
> missing much. ;-)

I meant his question on who is checking tx queue stopped before
calling driver xmit. In stopped queue case, qdisc_restart makes
sure sch_direct_xmit is not called for all subsequent skbs.

Sridhar is seeing 280K requeue's, and that probably implies device
was stopped and wrongly restarted immediately. So the next xmit in
the kernel found the txq is not stopped and called the xmit handler,
get a BUSY, requeue, and so on. That would also explain why his BW
drops so much - all false starts (besides 19% of all skbs being
requeued). I assume that each time when we check:

      if (!netif_tx_queue_stopped(txq) && !netif_tx_queue_frozen(txq))
            ret = dev_hard_start_xmit(skb, dev, txq);
it passes the check and dev_hard_start_xmit is called wrongly.

#Requeues: 283575
#total skbs: 1469482
Percentage requeued: 19.29%

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html