[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f41d40cc-e474-1324-be0a-7beaf580c292@suse.com>
Date: Mon, 21 Jun 2021 11:11:18 +0200
From: Varad Gautam <varad.gautam@...e.com>
To: Steffen Klassert <steffen.klassert@...unet.com>
CC: linux-kernel@...r.kernel.org,
linux-rt-users <linux-rt-users@...r.kernel.org>,
netdev@...r.kernel.org, Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Florian Westphal <fw@...len.de>,
"Ahmed S. Darwish" <a.darwish@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
stable@...r.kernel.org
Subject: Re: [PATCH] xfrm: policy: Restructure RCU-read locking in
xfrm_sk_policy_lookup
On 6/21/21 10:29 AM, Steffen Klassert wrote:
> On Fri, Jun 18, 2021 at 04:11:01PM +0200, Varad Gautam wrote:
>> Commit "xfrm: policy: Read seqcount outside of rcu-read side in
>> xfrm_policy_lookup_bytype" [Linked] resolved a locking bug in
>> xfrm_policy_lookup_bytype that causes an RCU reader-writer deadlock on
>> the mutex wrapped by xfrm_policy_hash_generation on PREEMPT_RT since
>> 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated
>> lock").
>>
>> However, xfrm_sk_policy_lookup can still reach xfrm_policy_lookup_bytype
>> while holding rcu_read_lock(), as:
>> xfrm_sk_policy_lookup()
>> rcu_read_lock()
>> security_xfrm_policy_lookup()
>> xfrm_policy_lookup()
>
> Hm, I don't see that call chain. security_xfrm_policy_lookup() calls
> a hook with the name xfrm_policy_lookup. The only LSM that has
> registered a function to that hook is selinux. It registers
> selinux_xfrm_policy_lookup() and I don't see how we can call
> xfrm_policy_lookup() from there.
>
> Did you actually trigger that bug?
>
Right, I misread the call chain - security_xfrm_policy_lookup does not reach
xfrm_policy_lookup, making this patch unnecessary. The bug I have is:
T1, holding hash_resize_mutex and sleeping inside synchronize_rcu:
__schedule
schedule
schedule_timeout
wait_for_completion
__wait_rcu_gp
synchronize_rcu
xfrm_hash_resize
And T2 producing RCU-stalls since it blocked on the mutex:
__schedule
schedule
__rt_mutex_slowlock
rt_mutex_slowlock_locked
rt_mutex_slowlock
xfrm_policy_lookup_bytype.constprop.77
__xfrm_policy_check
udpv6_queue_rcv_one_skb
__udp6_lib_rcv
ip6_protocol_deliver_rcu
ip6_input_finish
ip6_input
ip6_mc_input
ipv6_rcv
__netif_receive_skb_one_core
process_backlog
net_rx_action
__softirqentry_text_start
__local_bh_enable_ip
ip6_finish_output2
ip6_output
ip6_send_skb
udp_v6_send_skb
udpv6_sendmsg
sock_sendmsg
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64
So, despite the patch here [1], there is another way to reach
xfrm_policy_lookup_bytype within an RCU-read side - which on PREEMPT_RT will
deadlock with xfrm_hash_resize. Does softirq processing on RT happen within
rcu_read_lock/unlock - this would explain the stalls.
[1] https://lore.kernel.org/r/20210528160407.32127-1-varad.gautam@suse.com/
Regards,
Varad
--
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
HRB 36809, AG Nürnberg
Geschäftsführer: Felix Imendörffer
Powered by blists - more mailing lists