netdev - Re: [RFC PATCH net-next] qfq: handle the case that front slot is empty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 26 Oct 2012 09:51:36 +0200
From:	Paolo Valente <paolo.valente@...more.it>
To:	Cong Wang <amwang@...hat.com>
Cc:	netdev@...r.kernel.org, Stephen Hemminger <shemminger@...tta.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: [RFC PATCH net-next] qfq: handle the case that front slot is empty

Il giorno 23/ott/2012, alle ore 10:53, Cong Wang ha scritto:

> On Tue, 2012-10-23 at 09:09 +0200, Paolo Valente wrote:
>> The crash you reported is one of the problems I tried to solve with my last fixes.
>> After those fixes I could not reproduce this crash (and other crashes) any more, but of course I am still missing something.
> 
> I am using the latest net-next, so if your patches are in net-next,
> then the problem of course still exists.
> 
>> 
>> Il giorno 23/ott/2012, alle ore 06:15, Cong Wang ha scritto:
>> 
>>> I am not sure if this patch fixes the real problem or just workarounds
>>> it. At least, after this patch I don't see the crash I reported any more.
>> 
>> It is actually a workaround: if the condition that triggers your workaround holds true, then the group data structure is already inconstent, and qfq is likely not to schedule classes correctly.
>> I will try to reproduce the crash with the steps you suggest, and try to understand what is still wrong as soon as I can.
>> 
> 
> OK, I don't pretend I understand qfq.
The problem is that I should :)
> And I can help you to test
> patches.
> 
I think I will ask for your help soon, thanks.

The cause of the failure is TCP segment offloading, which lets qfq receive packets with a much larger size than the MTU of the device.
In this respect, under qfq the default max packet size lmax for each class (2KB) is only slightly higher than the MTU. Violating the lmax constraint causes the corruption of the data structure that implements the bucket lists of the groups. In fact, the failure that you found is only one of the consequences of this corruption. I am sorry I did not discover it before, but, foolishly, I have run only UDP tests.

I am thinking about the best ways for addressing this issue.

BTW, I think that the behavior of all the other schedulers should be checked as well. For example, with segment offloading, drr must increment the deficit of a class for at most (64K/quantum) times, i.e., rounds, before it can serve the next packet of the class. The number of instructions per packet dequeue becomes therefore (64K/quantum) times higher than without segment offloading.

> Thanks!
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html