lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b4f35b9-6419-49a4-b73e-5d02e3cbc69a@ovn.org>
Date: Mon, 17 Jun 2024 17:09:44 +0200
From: Ilya Maximets <i.maximets@....org>
To: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc: i.maximets@....org, netdev@...r.kernel.org, edumazet@...gle.com,
 pabeni@...hat.com, Stefano Brivio <sbrivio@...hat.com>, dsahern@...nel.org,
 donald.hunter@...il.com, Sabrina Dubroca <sdubroca@...hat.com>
Subject: Re: [PATCH net] inet: bring NLM_DONE out to a separate recv() again

On 4/11/24 21:35, Ilya Maximets wrote:
> On 4/11/24 20:02, Jakub Kicinski wrote:
>> Commit under Fixes optimized the number of recv() calls
>> needed during RTM_GETROUTE dumps, but we got multiple
>> reports of applications hanging on recv() calls.
>> Applications expect that a route dump will be terminated
>> with a recv() reading an individual NLM_DONE message.
>>
>> Coalescing NLM_DONE is perfectly legal in netlink,
>> but even tho reporters fixed the code in respective
>> projects, chances are it will take time for those
>> applications to get updated. So revert to old behavior
>> (for now)?
>>
>> Old kernel (5.19):
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 692 bytes, 11 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>  Recv: read 20 bytes, 1 messages
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Before (6.9-rc2):
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 712 bytes, 12 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> After:
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 692 bytes, 11 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>  Recv: read 20 bytes, 1 messages
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Reported-by: Stefano Brivio <sbrivio@...hat.com>
>> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
>> Reported-by: Ilya Maximets <i.maximets@....org>
>> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
>> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
>> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
>> ---
>> CC: dsahern@...nel.org
>> CC: donald.hunter@...il.com
>> ---
>>  net/ipv4/fib_frontend.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 48741352a88a..c484b1c0fc00 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>>  			e++;
>>  		}
>>  	}
>> +
>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
>> +	 * Some user space expects NLM_DONE in a separate recv().
>> +	 */
>> +	err = skb->len;
>>  out:
>>  
>>  	cb->args[1] = e;
> 
> FWIW, on current net-next this fixes the issue with Libreswan and IPv4
> (IPv6 issue remains, obviously).  I also did a round of other OVS system
> tests and they worked fine.
> 
> Tested-by: Ilya Maximets <i.maximets@....org>

Hi, Jakub.  Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
fix for it as well?  (Sorry if I missed it.)  Libreswan is getting stuck on IPv6
route lookups with 6.10-rc4.

Note: Libreswan fixed the issue on their main branch, but it is not available in
any release yet, and I'm not sure if the fix is going to make it into stable
releases.

Best regards, Ilya Maximets.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ