netdev - Re: RFS seems to have incompatiblities with bridged vlans

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTilIzRt94SiNlcAbI6tFx2-V3XcnhWtlIG8pXIGG@mail.gmail.com>
Date:	Tue, 8 Jun 2010 18:52:37 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	John Fastabend <john.r.fastabend@...el.com>
Cc:	"tim.gardner@...onical.com" <tim.gardner@...onical.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Peter Lieven <pl@....net>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: RFS seems to have incompatiblities with bridged vlans

>
> Problem with this is it doesn't address mis-aligned num_rx_queues.  For
> example with the bonding driver defaulting to 16 queues now. We could end up
> with a base driver with 16+ queues and a bond with 16.  Then we have the
> same issue again.
>
>  eth0  -------> bond / bridge ---------> vlan.id
> (nbrxq=64)      (nbrxq=16)              (nbrxq=X)
>

I don't think the intent is to solve this alignment problem here.  If
a driver allocates multiple queues, it should set the queue mapping
accordingly.  If this isn't the case, the meaning of queue mapping is
ambiguous (which driver is it relative to?), and using the value for
rps_cpus is probably not going to work well-- hacking the
queue-mapping to be a legal value in get_rps_cpus doesn't seem like
the right answer.

>>>
>>> On Tue, Jun 8, 2010 at 7:18 AM, Eric Dumazet<eric.dumazet@...il.com>
>>>  wrote:
>>>>
>>>> Le lundi 07 juin 2010 à 15:30 -0700, John Fastabend a écrit :
>>>>
>>>>> There is always a possibility that the underlying device sets the
>>>>> queue_mapping to be greater then num_cpus.  Also I suspect the same
>>>>> issue exists with bonding devices.  Maybe something like the following
>>>>> is worth while? compile tested only,
>>>>>
>>>>> [PATCH] 8021q: vlan reassigns dev without check queue_mapping
>>>>>
>>>>> recv path reassigns skb->dev without sanity checking the
>>>>> queue_mapping field.  This can result in the queue_mapping
>>>>> field being set incorrectly if the new dev supports less
>>>>> queues then the underlying device.
>>>>>
>>>>> This patch just resets the queue_mapping to 0 which should
>>>>> resolve this issue?  Any thoughts?
>>>>>
>>>>> The same issue could happen on bonding devices as well.
>>>>>
>>>>> Signed-off-by: John Fastabend<john.r.fastabend@...el.com>
>>>>> ---
>>>>>
>>>>>   net/8021q/vlan_core.c |    6 ++++++
>>>>>   1 files changed, 6 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>>>> index bd537fc..ad309f8 100644
>>>>> --- a/net/8021q/vlan_core.c
>>>>> +++ b/net/8021q/vlan_core.c
>>>>> @@ -21,6 +21,9 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct
>>>>> vlan_group *grp,
>>>>>       if (!skb->dev)
>>>>>               goto drop;
>>>>>
>>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>>> +             skb_set_queue_mapping(skb, 0);
>>>>> +
>>>>>       return (polling ? netif_receive_skb(skb) : netif_rx(skb));
>>>>>
>>>>>   drop:
>>>>> @@ -93,6 +96,9 @@ vlan_gro_common(struct napi_struct *napi, struct
>>>>> vlan_group *grp,
>>>>>       if (!skb->dev)
>>>>>               goto drop;
>>>>>
>>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>>> +             skb_set_queue_mapping(skb, 0);
>>>>> +
>>>>>       for (p = napi->gro_list; p; p = p->next) {
>>>>>               NAPI_GRO_CB(p)->same_flow =
>>>>>                       p->dev == skb->dev&&  !compare_ether_header(
>>>>> --
>>>>
>>>> Only a workaround, added in hot path in a otherwise 'good' driver
>>>> (multiqueue enabled and ready)
>
> Agreed thanks!
>
>>>>
>>>> eth0  ------->  bond / bridge --------->  vlan.id
>>>> (nbtxq=8)      (ntxbq=1)        (nbtxq=X)
>>>>
>>>> X is capped to 1 because of bond/bridge, while bond has no "queue"
>>>> (LLTX driver)
>>>>
>>>> Solutions :
>>>>
>>>> 1) queue_mapping could be silently tested in get_rps_cpu()...
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 6f330ce..3a3f7f6 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -2272,14 +2272,11 @@ static int get_rps_cpu(struct net_device *dev,
>>>> struct sk_buff *skb,
>>>>
>>>>        if (skb_rx_queue_recorded(skb)) {
>>>>                u16 index = skb_get_rx_queue(skb);
>>>> -               if (unlikely(index>= dev->num_rx_queues)) {
>>>> -                       if (net_ratelimit()) {
>>>> -                               pr_warning("%s received packet on queue
>>>> "
>>>> -                                       "%u, but number of RX queues is
>>>> %u\n",
>>>> -                                       dev->name, index,
>>>> dev->num_rx_queues);
>>>> -                       }
>>>> -                       goto done;
>>>> -               }
>>>> +               if (WARN_ONCE(index>= dev->num_rx_queues,
>>>> +                               KERN_WARNING "%s received packet on
>>>> queue %u, "
>>>> +                               "but number of RX queues is %u\n",
>>>> +                               dev->name, index, dev->num_rx_queues))
>>>> +                       index %= dev->num_rx_queues;
>>>>                rxqueue = dev->_rx + index;
>>>>        } else
>>>>                rxqueue = dev->_rx;
>>>>
>>>>
>
> Looks good to me.
>
>>>>
>>>> 2) bond/bridge should setup more queues, just in case.
>>>>   We probably need to be able to make things more dynamic,
>>>>   (propagate nbtxq between layers) but not for 2.6.35
>>>>
>>>>
>
> The bonding driver is already multiq per Andy Gospodarek's patch commit
> bb1d912, but unless the bond and bridge devices use the max num_rx_queues of
> there underlying devices this could still go wrong.
>
> The bonding driver would possibly need to increase num_rx_queues and
> num_tx_queues when a device is enslaved or be set to some maximum at init
> for this to work right.
>
>>>>
>>>> diff --git a/drivers/net/bonding/bond_main.c
>>>> b/drivers/net/bonding/bond_main.c
>>>> index 5e12462..ce813dd 100644
>>>> --- a/drivers/net/bonding/bond_main.c
>>>> +++ b/drivers/net/bonding/bond_main.c
>>>> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>>>>
>>>>        rtnl_lock();
>>>>
>>>> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name :
>>>> "",
>>>> -                               bond_setup);
>>>> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name :
>>>> "",
>>>> +                                  bond_setup, max(64, nr_cpu_ids));
>>>>        if (!bond_dev) {
>>>>                pr_err("%s: eek! can't alloc netdev!\n", name);
>>>>                rtnl_unlock();
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@...r.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> Huh, so you guys are looking at the same issue (only my issue is RPS). See
>> http://marc.info/?l=linux-netdev&m=127603240621028&w=2. I'm in favor of
>> dropping the warning when no queues have been allocated.
>>
>> How about this (see attached).
>
> Prefer Eric's patch see first comment.
>
> Thanks,
> John
>
>>
>> rtg
>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html