lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 6 Jun 2016 08:43:13 +0200
From:	John Crispin <john@...ozen.org>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org, linux-mediatek@...ts.infradead.org,
	keyhaede@...il.com, linux-kernel@...r.kernel.org, nbd@....name
Subject: Re: [PATCH 01/12] net: mediatek: fix DQL support



On 05/06/2016 09:32, David Miller wrote:
> From: John Crispin <john@...ozen.org>
> Date: Sun,  5 Jun 2016 08:32:54 +0200
> 
>> @@ -625,7 +625,16 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
>>  	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
>>  				(!nr_frags * TX_DMA_LS0)));
>>  
>> -	netdev_sent_queue(dev, skb->len);
>> +	/* we have a single DMA ring so BQL needs to be updated for all devices
>> +	 * sitting on this ring
>> +	 */
>> +	for (i = 0; i < MTK_MAC_COUNT; i++) {
>> +		if (!eth->netdev[i])
>> +			continue;
>> +
>> +		netdev_sent_queue(eth->netdev[i], skb->len);
>> +	}
>> +
>>  	skb_tx_timestamp(skb);
> 
> Sorry, this is very far from working.
> 
> You cannot asynchronously touch the DQL state of another netdevice.
> 
> You have to hold the TX lock of a queue while changing it's DQL state,
> otherwise you'll corrupt the state.
> 
> This "loop over all possible devices on this DMA ring" is pretty
> expensive for the problem you're trying to solve.
> 
> You'll have to find another way to fix this bug, which BTW I'm not too
> clear about.  The commit message doesn't explain sufficiently what the
> actual problem is.  "not deterministic" doesn't give enough details.
> 

Hi David,

DQL is supposed to measure how much data is enqueued on a netdev. the
problem here is that two devices share the same hardware queue. fq_codel
for example uses the values from dql to base its QoS judgement on. if we
track the dql of the 2 devices separately then the values will not take
the actual amount of data enqueued into account but only parts of it.
this will make the queue length used as a basis for fq_codel
calculations non deterministic thus breaking qos. dql needs to track the
amount of data in the physical queue underlying the netdev to be useful.
hope that explanation is better to understand.

i think one solution would be to add some code to have 2 devices share
the same dql instance. would that be an acceptable solution ?

anyhow, i will resend the series without the dql patch today and then
worry about it afterwards.

	John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ