netdev - Re: [PATCH v4 1/1] rps: core implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65634d661001052356m494475ck5f6c47ab860fda35@mail.gmail.com>
Date:	Tue, 5 Jan 2010 23:56:41 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: [PATCH v4 1/1] rps: core implementation

Eric, thanks for the comments.

> Hmm, I cant see a race condition here, could you elaborate on this ?
> mask is local to this cpu, and we cannot re-enter a function that could
> change some bits under us (we are called from net_rx_action())
> If you believe there is a race condition, I suspect race is still there.
>
We're allowing bits in the map to be set in netif_rx out of interrupt
and __smp_call_function_single needs to be called with interrupts
disabled.  I guess an alternative would be to copy the mask to a local
variable, and then clear the mask and scan over the local variable.
Would there be complaints with stack space in local cpumask_t
variable?

>
> Did you tested with VLANS too ? (with/without hardware support)
>
No.  Looks like vlan functions eventually call netif_receive_skb
though... I suppose I could test that.

>>
>> Tom
>
> Excellent, but I suspect big win comes from using few NICS.
> (number_of(NICS) < num_online_cpus)
>
Yes, our primary motivation for developing this was a single NIC with
a single queue on a multicore system.

> (in the reverse case, possible contention on queue->csd)
>
Actually there should never be contention on that, only one cpu at at
time will access it which is the one that successfully schedules napi
on the remote CPU from enqueue_to_backlog.

>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 97873e3..7107b13 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -676,6 +676,29 @@ struct net_device_ops {
>>  };
>>
>>  /*
>> + * Structure for Receive Packet Steering.  Length of map and array of CPU ID's.
>> + */
>> +struct rps_map {
>> +     int len;
>> +     u16 map[0];
>> +};
>> +
>> +/*
>> + * Structure that contains the rps maps for various NAPI instances of a device.
>> + */
>> +struct dev_rps_maps {
>> +     int num_maps;
>> +     struct rcu_head rcu;
>> +     struct rps_map maps[0];
>> +};
>
> I feel uneasy about this, because of kmalloc() max size and rounding to power of two effects.

Other than some wasted memory what is bad about that?

> It also uses a single node in NUMA setups.

I suppose we should allocate from the devices NUMA node... I'd really
like to store the maps with the napi instances themselves and this
would work well if the napi structs are allocated on the NUMA node
where the interrupt is called (I think this in Peter Waskiewicz's irq
patch).  Unfortunately, the information is lost by the time
netif_receive_skb is called anyway.

>> +
>> +/* Bound number of CPUs that can be in an rps map */
>> +#define MAX_RPS_CPUS (num_possible_cpus() < 256 ? num_possible_cpus() : 256)
>> +
>> +/* Maximum size of RPS map (for allocation) */
>> +#define RPS_MAP_SIZE (sizeof(struct rps_map) + (MAX_RPS_CPUS * sizeof(u16)))
>> +
>> +/*
>>   *   The DEVICE structure.
>>   *   Actually, this whole structure is a big mistake.  It mixes I/O
>>   *   data with strictly "high-level" data, and it has to know about
>> @@ -861,6 +884,9 @@ struct net_device {
>>
>>       struct netdev_queue     rx_queue;
>>
>> +     struct dev_rps_maps     *dev_rps_maps;  /* Per-NAPI maps for
>> +                                                receive packet steeing */
>> +
>
>
> If you store rps_map pointer into napi itself, you could avoid this MAX_RPS_CPUS thing
> and really dynamically allocate the structure with the number of online cpus mentioned
> in the map.
>
> But yes, it makes store_rps_cpus() more complex :(
>
As I pointed out above, I would like to that.  Probably would involve
adding a pointer to napi_struct in skb...

> This probably can be done later, this Version 4 of RPS looks very good, thanks !
> I am going to test it today on my dev machine before giving an Acked-by :)
>
Thanks!

> Reviewed-by: Eric Dumazet <eric.dumazet@...il.com>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html