[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93bd35dc-59d7-7922-becd-fb77c4a1a0e6@intel.com>
Date: Wed, 23 May 2018 12:19:26 -0700
From: "Nambiar, Amritha" <amritha.nambiar@...el.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Tom Herbert <tom@...bertland.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Sridhar Samudrala <sridhar.samudrala@...el.com>,
Eric Dumazet <edumazet@...gle.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: Re: [net-next PATCH v2 2/4] net: Enable Tx queue selection based on
Rx queues
On 5/19/2018 1:13 PM, Willem de Bruijn wrote:
> On Fri, May 18, 2018 at 12:03 AM, Tom Herbert <tom@...bertland.com> wrote:
>> On Tue, May 15, 2018 at 6:26 PM, Amritha Nambiar
>> <amritha.nambiar@...el.com> wrote:
>>> This patch adds support to pick Tx queue based on the Rx queue map
>>> configuration set by the admin through the sysfs attribute
>>> for each Tx queue. If the user configuration for receive
>>> queue map does not apply, then the Tx queue selection falls back
>>> to CPU map based selection and finally to hashing.
>>>
>>> Signed-off-by: Amritha Nambiar <amritha.nambiar@...el.com>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@...el.com>
>>> ---
>>> include/net/sock.h | 18 ++++++++++++++++++
>>> net/core/dev.c | 36 +++++++++++++++++++++++++++++-------
>>> net/core/sock.c | 5 +++++
>>> net/ipv4/tcp_input.c | 7 +++++++
>>> net/ipv4/tcp_ipv4.c | 1 +
>>> net/ipv4/tcp_minisocks.c | 1 +
>>> 6 files changed, 61 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index 4f7c584..0613f63 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -139,6 +139,8 @@ typedef __u64 __bitwise __addrpair;
>>> * @skc_node: main hash linkage for various protocol lookup tables
>>> * @skc_nulls_node: main hash linkage for TCP/UDP/UDP-Lite protocol
>>> * @skc_tx_queue_mapping: tx queue number for this connection
>>> + * @skc_rx_queue_mapping: rx queue number for this connection
>>> + * @skc_rx_ifindex: rx ifindex for this connection
>>> * @skc_flags: place holder for sk_flags
>>> * %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
>>> * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
>>> @@ -215,6 +217,10 @@ struct sock_common {
>>> struct hlist_nulls_node skc_nulls_node;
>>> };
>>> int skc_tx_queue_mapping;
>>> +#ifdef CONFIG_XPS
>>> + int skc_rx_queue_mapping;
>>> + int skc_rx_ifindex;
>>
>> Isn't this increasing size of sock_common for a narrow use case functionality?
>
> You can get the device from the already recorded sk_napi_id.
> Sadly, not the queue number as far as I can see.
>
I plan to not have the ifindex cached in the sock_common, but retain the
rx_queue only. This way, it'll look similar to skb_tx_hash where
rx_queue recorded is used and if not, fall through to flow hash
calculation. Likewise, we use the rx_queue mapped and fall through to
CPU map on failures.
>
>>> +static inline void sk_mark_rx_queue(struct sock *sk, struct sk_buff *skb)
>>> +{
>>> +#ifdef CONFIG_XPS
>>> + sk->sk_rx_ifindex = skb->skb_iif;
>>> + sk->sk_rx_queue_mapping = skb_get_rx_queue(skb);
>>> +#endif
>>> +}
>>> +
>
> Instead of adding this function and calls to it in many locations in
> the stack, you can expand sk_mark_napi_id.
>
> Also, it is not clear why this should be called in locations where
> sk_mark_napi_id is not.
>
Makes sense, I will add this as part of sk_mark_napi_id.
>
>>> +static int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
>>> +{
>>> +#ifdef CONFIG_XPS
>>> + enum xps_map_type i = XPS_MAP_RXQS;
>>> + struct xps_dev_maps *dev_maps;
>>> + struct sock *sk = skb->sk;
>>> + int queue_index = -1;
>>> + unsigned int tci = 0;
>>> +
>>> + if (sk && sk->sk_rx_queue_mapping <= dev->real_num_rx_queues &&
>>> + dev->ifindex == sk->sk_rx_ifindex)
>>> + tci = sk->sk_rx_queue_mapping;
>>> +
>>> + rcu_read_lock();
>>> + while (queue_index < 0 && i < __XPS_MAP_MAX) {
>>> + if (i == XPS_MAP_CPUS)
>>
>> This while loop typifies exactly why I don't think the XPS maps should
>> be an array.
>
> +1
>
Okay, I will change this to two maps with separate pointers.
Powered by blists - more mailing lists