[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1267529273.2964.111.camel@edumazet-laptop>
Date: Tue, 02 Mar 2010 12:27:53 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Mike Galbraith <efault@....de>
Cc: netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [rfc/rft][patch] should use scheduler sync hint in
tcp_prequeue()?
Le mardi 02 mars 2010 à 10:41 +0100, Mike Galbraith a écrit :
> Greetings network land.
>
> The reason for this query is that wake_affine() fails if there is one
> and only one task on a runqueue to encourage tasks spreading out, which
> increases cpu utilization. However, for tasks which are communicating
> at high frequency, the cost of the resulting cache misses, should
> partners land in non-shared caches, is horrible to behold. My Q6600 has
> shared caches, which may or may not be hit IFF something perturbs the
> system, and bounces partner to the right core. That won't happen on a
> box with no shared caches of course, and even with shared caches
> available, the pain is highly visible in the TCP numbers below.
>
> The sync hint tells wake_affine() that the waker is likely going to
> sleep soonish, so it subtracts the waker from the load imbalance
> calculation, allowing the partner task to be awakened affine. In the
> shared cache available case, that is also an enabler that the task be
> placed in an idle shared cache, which can increase throughput quite a
> bit (see .31 vs .33 AF UNIX), or may cost a bit if there is little to no
> execution overlap (see pipe).
>
> Now, I _could_ change wake_affine() to globally succeed in the one task
> case, but am loath to do so because that very well may upset delicate
> load balancing apple cart. I think it's much safer to target the spot
> that I know hurts like hell. Thoughts?
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 34f5cc2..ba3fc64 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -939,7 +939,7 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
>
> tp->ucopy.memory = 0;
> } else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
> - wake_up_interruptible_poll(sk->sk_sleep,
> + wake_up_interruptible_sync_poll(sk->sk_sleep,
> POLLIN | POLLRDNORM | POLLRDBAND);
> if (!inet_csk_ack_scheduled(sk))
> inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
>
I suspect this discussion is more a lkml topic but anyway...
This wake_up_interruptible_sync_poll() change might be good for loopback
communications (and pleases tbench), but is it desirable for regular
multi flows NIC traffic ?
Ingo probably can answer to this question, since he changed
sock_def_readable() (and others) in commit 6f3d09291b498299
I suspect he missed tcp_prequeue() case, maybe not...
sched, net: socket wakeups are sync
'sync' wakeups are a hint towards the scheduler that (certain)
networking related wakeups likely create coupling between tasks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists