[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1272605104.2209.658.camel@edumazet-laptop>
Date: Fri, 30 Apr 2010 07:25:04 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Andi Kleen <ak@...goyle.fritz.box>
Cc: Andi Kleen <andi@...stfloor.org>, hadi@...erus.ca,
Changli Gao <xiaosuo@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Tom Herbert <therbert@...gle.com>,
Stephen Hemminger <shemminger@...tta.com>,
netdev@...r.kernel.org, lenb@...nel.org, arjan@...radead.org
Subject: Re: [PATCH v6] net: batch skb dequeueing from softnet
input_pkt_queue
Le jeudi 29 avril 2010 à 23:41 +0200, Andi Kleen a écrit :
> On Thu, Apr 29, 2010 at 09:12:27PM +0200, Eric Dumazet wrote:
> > Yes, mostly, but about 200.000 wakeups per second I would say...
> >
> > If a cpu in deep state receives an IPI, process a softirq, should it
> > come back to deep state immediately, or should it wait for some
> > milliseconds ?
>
> In principle the cpuidle governour should detect this and not put the target into
> the slow deep c states. One change that was done recently to fix a similar
> problem for disk IO was to take processes that wait for IO into account
> (see 69d25870). But it doesn't work for networking.
>
> Here's a untested patch that might help: tell the cpuidle governour
> networking is waiting for IO. This will tell it to not go down the deeply.
>
> I might have missed some schedule() paths, feel free to add more.
>
> Actually it's probably too aggressive because it will avoid C states even for
> a closed window on the other side which might be hours. Better would
> be some heuristic to only do this when you're really expected IO shortly.
>
> Also does your workload even sleep at all? If not we would need to increase
> the iowait counters in recvmsg() itself.
>
My workload yes, uses blocking recvmsg() calls, but Jamal one uses
epoll() so I guess problem is more generic than that. We should have an
estimate of the number of wakeups (IO or not...) per second (or
sub-second) so that cpuidle can avoid these deep states ?
> Anyways might be still worth a try.
>
> For routing we probably need some other solution though, there are no
> schedules there.
>
> >
> > > Perhaps need to feed some information to cpuidle's governour to prevent this problem.
> > >
> > > idle=poll is very drastic, better to limit to C1
> > >
> >
> > How can I do this ?
>
> processor.max_cstate=1 or using /dev/network_latency
> (see Documentation/power/pm_qos_interface.txt)
>
> -Andi
>
Thanks, I'll play with this today !
>
>
> commit 810227a7c24ecae2bb4aac320490a7115ac33be8
> Author: Andi Kleen <ak@...ux.intel.com>
> Date: Thu Apr 29 23:33:18 2010 +0200
>
> Use io_schedule() in network stack to tell cpuidle governour to guarantee lower latencies
>
> XXX: probably too aggressive, some of these sleeps are not under high load.
>
> Based on a bug report from Eric Dumazet.
>
> Signed-off-by: Andi Kleen <ak@...ux.intel.com>
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index c5812bb..c246d6c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1402,7 +1402,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
> break;
> if (sk->sk_err)
> break;
> - timeo = schedule_timeout(timeo);
> + timeo = io_schedule_timeout(timeo);
> }
> finish_wait(sk->sk_sleep, &wait);
> return timeo;
> @@ -1512,7 +1512,7 @@ static void __lock_sock(struct sock *sk)
> prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
> TASK_UNINTERRUPTIBLE);
> spin_unlock_bh(&sk->sk_lock.slock);
> - schedule();
> + io_schedule();
> spin_lock_bh(&sk->sk_lock.slock);
> if (!sock_owned_by_user(sk))
> break;
>
> >
> > Thanks !
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists