netdev - Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <05518e10-0da0-ab83-8917-02853e34ac3e@brocade.com>
Date:   Tue, 22 Nov 2016 13:17:44 +0000
From:   Mike Manning <mmanning@...cade.com>
To:     Hannes Frederic Sowa <hannes@...essinduktion.org>,
        <netdev@...r.kernel.org>
Subject: Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc

On 11/22/2016 12:18 PM, Hannes Frederic Sowa wrote:
> On 22.11.2016 11:34, Mike Manning wrote:
>> Bursts of failures may occur when adding IPv6 routes via Netlink to the
>> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
>> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
>> extended the area map in time for the atomic allocation using percpu.c:
>> pcpu_alloc() to succeed. This results in route additions failing with
>> an -ENOMEM error.
>>
>> While the sender of the Netlink msg to add this route could check for
>> an ACK and retransmit in the case of an -ENOMEM error, the latter
>> should not occur in the first place if there is plenty of memory. The
>> solution is to use non-atomic alloc for rt6_info instead. While the
>> client may now be blocked for longer depending on the state of the
>> chunk being added to, this work has to be incurred at some point.
>>
>> The alternative solution would be to provide configurable parameters
>> e.g. via sysctl in percpu.c for default map size, low/high empty pages
>> and map margins. For this solution, the map margin sizes need to be
>> stored per chunk, as large margins cannot be used if the dynamic early
>> slots map size is in use. This is not a preferred solution though, as
>> it requires tuning of these parameters to provide sufficient margins to
>> avoid -ENOMEM errors depending on system requirements.
>>
>> Signed-off-by: Mike Manning <mmanning@...cade.com>
>> ---
>>  net/ipv6/route.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 1b57e11..0e9bb76 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>>  	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>>  
>>  	if (rt) {
>> -		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
>> +		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>>  		if (rt->rt6i_pcpu) {
>>  			int cpu;
> 
> Nak, this doesn't work, as ip6_dst_alloc must be callable from
> non-blocking code paths unfortunately.
> 
> 

Thanks for the prompt reply.

Do you consider the alternative of providing configurable parameters for per-cpu
alloc as viable, or is there a better way of dealing with this?

While I have tested such param changes under scale as avoiding the -ENOMEM errors, it
would be good to get confirmation that this approach is acceptable prior to coding the
sysctl handling for these.