netdev - Re: bad networking related lag in v2.6.22-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <46541DC4.4090501@trash.net>
Date:	Wed, 23 May 2007 12:56:04 +0200
From:	Patrick McHardy <kaber@...sh.net>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Anant Nitya <kernel@...chanda.info>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"David S. Miller" <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: bad networking related lag in v2.6.22-rc2

Ingo Molnar wrote:
> if you feel inclined to try the git-bisection then by all means please 
> do it (it will certainly be helpful and educative), but it's optional: i 
> dont think you should 'need' to go through extra debugging chores, my 
> analysis based on the excellent trace you provided still holds and 
> whoever modified htb_dequeue()'s logic recently ought to be able to 
> figure that out (or send you a debug patch to further narrow the problem 
> down).
>
> The trace shows a _clearly_ anomalous loop: for example there's 56396 
> (!) calls to rb_first() in htb_dequeue() [without the kernel ever 
> exiting that function]:
> 
>   earth4:~/s> grep rb_first trace-to-ingo.txt  | wc -l
>   56396


How is this trace to be understood? Is it simply a call trace in
execution-order? If thats the case than we are exiting htb_dequeue,
each call to qdisc_watchdog_schedule happens at the very end of
that function, which would imply a bug in __qdisc_run.

Looking at the recent changes to __qdisc_run, this indeed seems
to be the case, when the qdisc is throttled and has packets queued
we return a value != 0, causing __qdisc_run to loop until all
packets have been sent, which may be a long time.

Anant, can you please verify by testing the attached patch? Thanks.

View attachment "x" of type "text/plain" (319 bytes)