lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 21 May 2009 11:07:19 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	David Miller <davem@...emloft.net>
CC:	khc@...waw.pl, netdev@...r.kernel.org, satoru.satoh@...il.com
Subject: Re: [PATCH] net: reduce number of reference taken on sk_refcnt

David Miller a écrit :
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Sun, 10 May 2009 12:45:56 +0200
> 
>> Patch follows for RFC only (not Signed-of...), and based on net-next-2.6 
> 
> Thanks for the analysis.
> 
>> @@ -922,10 +922,13 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
>>  	} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
>>  		wake_up_interruptible_poll(sk->sk_sleep,
>>  					   POLLIN | POLLRDNORM | POLLRDBAND);
>> -		if (!inet_csk_ack_scheduled(sk))
>> +		if (!inet_csk_ack_scheduled(sk)) {
>> +			unsigned int delay = (3 * tcp_rto_min(sk)) / 4;
>> +
>> +			delay = min(inet_csk(sk)->icsk_ack.ato, delay);
>>  			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
>> -						  (3 * tcp_rto_min(sk)) / 4,
>> -						  TCP_RTO_MAX);
>> +						  delay, TCP_RTO_MAX);
>> +		}
>>  	}
>>  	return 1;
> 
> I think this code is trying to aggressively stretch the ACK when
> prequeueing.  In order to make sure there is enough time to get
> the process on the CPU and send a response, and thus piggyback
> the ACK.
> 
> If that turns out not to really matter, or matter less than your
> problem, then we can make your change and I'm all for it.

This change gave me about 15% increase in bandwidth in a multiflow
tcp benchmark. But this optimization worked because tasks could be
wakeup and send their answer in the same jiffies, and 'rearming'
the xmit timer with the same value...

(135.000 messages received/sent per second in my benchmark, with 60 flows)

mod_timer() has special heuristic to avoid calling __mod_timer()

int mod_timer(struct timer_list *timer, unsigned long expires)
{
        /*
         * This is a common optimization triggered by the
         * networking code - if the timer is re-modified
         * to be the same thing then just return:
         */
        if (timer->expires == expires && timer_pending(timer))
                return 1;

        return __mod_timer(timer, expires, false);
}

with HZ=1000, and real applications (using more than 1 msec to process the request),
I suppose this kind of optimization is unlikely to happen,
so we might extend mod_timer() heuristic to avoid changing timer->expires
if the new value is almost the same than previous, and not "exactly the same value"

int mod_timer_unexact(struct timer_list *timer, unsigned long expires, long maxerror)
{
        /*
         * This is a common optimization triggered by the
         * networking code - if the timer is re-modified
         * to be about the same thing then just return:
         */
        if (timer_pending(timer)) {
		long delta = expires - timer->expires;

		if (delta <= maxerror && delta >= -maxerror)
	                return 1;
	}
        return __mod_timer(timer, expires, false);
}



But to be effective, prequeue needs a blocked task
for each flow, and modern daemons prefer to use poll/epoll and
prequeue is thus not used.

Another possibility would be to use a different timer for prequeue 
exclusive use instead of sharing xmit_timer.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ