netdev - Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <1272605104.2209.658.camel@edumazet-laptop>
Date:	Fri, 30 Apr 2010 07:25:04 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Andi Kleen <ak@...goyle.fritz.box>
Cc:	Andi Kleen <andi@...stfloor.org>, hadi@...erus.ca,
	Changli Gao <xiaosuo@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Tom Herbert <therbert@...gle.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, lenb@...nel.org, arjan@...radead.org
Subject: Re: [PATCH v6] net: batch skb dequeueing from softnet
 input_pkt_queue

Le jeudi 29 avril 2010 à 23:41 +0200, Andi Kleen a écrit :
> On Thu, Apr 29, 2010 at 09:12:27PM +0200, Eric Dumazet wrote:
> > Yes, mostly, but about 200.000 wakeups per second I would say...
> > 
> > If a cpu in deep state receives an IPI, process a softirq, should it
> > come back to deep state immediately, or should it wait for some
> > milliseconds ?
> 
> In principle the cpuidle governour should detect this and not put the target into
> the slow deep c states. One change that was done recently to fix a similar 
> problem for disk IO was to take processes that wait for IO into account 
> (see 69d25870). But it doesn't work for networking.
> 
> Here's a untested patch that might help: tell the cpuidle governour 
> networking is waiting for IO. This will tell it to not go down the deeply.
> 
> I might have missed some schedule() paths, feel free to add more.
> 
> Actually it's probably too aggressive because it will avoid C states even for
> a closed window on the other side which might be hours. Better would
> be some heuristic to only do this when you're really expected IO shortly.
> 
> Also does your workload even sleep at all? If not we would need to increase
> the iowait counters in recvmsg() itself.
> 

My workload yes, uses blocking recvmsg() calls, but Jamal one uses
epoll() so I guess problem is more generic than that. We should have an
estimate of the number of wakeups (IO or not...) per second (or
sub-second) so that cpuidle can avoid these deep states ?

> Anyways might be still worth a try.
> 
> For routing we probably need some other solution though, there are no 
> schedules there.
> 
> > 
> > > Perhaps need to feed some information to cpuidle's governour to prevent this problem.
> > > 
> > > idle=poll is very drastic, better to limit to C1 
> > > 
> > 
> > How can I do this ?
> 
> processor.max_cstate=1 or using /dev/network_latency 
> (see Documentation/power/pm_qos_interface.txt)
> 
> -Andi
> 

Thanks, I'll play with this today !

> 
> 
> commit 810227a7c24ecae2bb4aac320490a7115ac33be8
> Author: Andi Kleen <ak@...ux.intel.com>
> Date:   Thu Apr 29 23:33:18 2010 +0200
> 
>     Use io_schedule() in network stack to tell cpuidle governour to guarantee lower latencies
> 
>     XXX: probably too aggressive, some of these sleeps are not under high load.
> 
>     Based on a bug report from Eric Dumazet.
>     
>     Signed-off-by: Andi Kleen <ak@...ux.intel.com>
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index c5812bb..c246d6c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1402,7 +1402,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
>  			break;
>  		if (sk->sk_err)
>  			break;
> -		timeo = schedule_timeout(timeo);
> +		timeo = io_schedule_timeout(timeo);
>  	}
>  	finish_wait(sk->sk_sleep, &wait);
>  	return timeo;
> @@ -1512,7 +1512,7 @@ static void __lock_sock(struct sock *sk)
>  		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
>  					TASK_UNINTERRUPTIBLE);
>  		spin_unlock_bh(&sk->sk_lock.slock);
> -		schedule();
> +		io_schedule();
>  		spin_lock_bh(&sk->sk_lock.slock);
>  		if (!sock_owned_by_user(sk))
>  			break;
> 
> > 
> > Thanks !
> > 
> > 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html