lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <2eabf3fb-9613-1b96-3ce9-993f94ef081d@linux.alibaba.com> Date: Tue, 17 Oct 2023 10:00:28 +0800 From: "D. Wythe" <alibuda@...ux.alibaba.com> To: dust.li@...ux.alibaba.com, Wenjia Zhang <wenjia@...ux.ibm.com>, kgraul@...ux.ibm.com, jaka@...ux.ibm.com, wintera@...ux.ibm.com Cc: kuba@...nel.org, davem@...emloft.net, netdev@...r.kernel.org, linux-s390@...r.kernel.org, linux-rdma@...r.kernel.org Subject: Re: [PATCH net 1/5] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT On 10/13/23 8:27 PM, Dust Li wrote: > On Fri, Oct 13, 2023 at 01:52:09PM +0200, Wenjia Zhang wrote: >> >> On 13.10.23 07:32, Dust Li wrote: >>> On Thu, Oct 12, 2023 at 01:51:54PM +0200, Wenjia Zhang wrote: >>>> >>>> On 12.10.23 04:37, D. Wythe wrote: >>>>> >>>>> On 10/12/23 4:31 AM, Wenjia Zhang wrote: >>>>>> >>>>>> On 11.10.23 09:33, D. Wythe wrote: >>>>>>> From: "D. Wythe" <alibuda@...ux.alibaba.com> >>>>>>> >>>>>>> Considering scenario: >>>>>>> >>>>>>> smc_cdc_rx_handler_rwwi >>>>>>> __smc_release >>>>>>> sock_set_flag >>>>>>> smc_close_active() >>>>>>> sock_set_flag >>>>>>> >>>>>>> __set_bit(DEAD) __set_bit(DONE) >>>>>>> >>>>>>> Dues to __set_bit is not atomic, the DEAD or DONE might be lost. >>>>>>> if the DEAD flag lost, the state SMC_CLOSED will be never be reached >>>>>>> in smc_close_passive_work: >>>>>>> >>>>>>> if (sock_flag(sk, SOCK_DEAD) && >>>>>>> smc_close_sent_any_close(conn)) { >>>>>>> sk->sk_state = SMC_CLOSED; >>>>>>> } else { >>>>>>> /* just shutdown, but not yet closed locally */ >>>>>>> sk->sk_state = SMC_APPFINCLOSEWAIT; >>>>>>> } >>>>>>> >>>>>>> Replace sock_set_flags or __set_bit to set_bit will fix this problem. >>>>>>> Since set_bit is atomic. >>>>>>> >>>>>> I didn't really understand the scenario. What is >>>>>> smc_cdc_rx_handler_rwwi()? What does it do? Don't it get the lock >>>>>> during the runtime? >>>>>> >>>>> Hi Wenjia, >>>>> >>>>> Sorry for that, It is not smc_cdc_rx_handler_rwwi() but >>>>> smc_cdc_rx_handler(); >>>>> >>>>> Following is a more specific description of the issues >>>>> >>>>> >>>>> lock_sock() >>>>> __smc_release >>>>> >>>>> smc_cdc_rx_handler() >>>>> smc_cdc_msg_recv() >>>>> bh_lock_sock() >>>>> smc_cdc_msg_recv_action() >>>>> sock_set_flag(DONE) sock_set_flag(DEAD) >>>>> __set_bit __set_bit >>>>> bh_unlock_sock() >>>>> release_sock() >>>>> >>>>> >>>>> >>>>> Note : |bh_lock_sock|and |lock_sock|are not mutually exclusive. They are >>>>> actually used for different purposes and contexts. >>>>> >>>>> >>>> ok, that's true that |bh_lock_sock|and |lock_sock|are not really mutually >>>> exclusive. However, since bh_lock_sock() is used, this scenario you described >>>> above should not happen, because that gets the sk_lock.slock. Following this >>>> scenarios, IMO, only the following situation can happen. >>>> >>>> lock_sock() >>>> __smc_release >>>> >>>> smc_cdc_rx_handler() >>>> smc_cdc_msg_recv() >>>> bh_lock_sock() >>>> smc_cdc_msg_recv_action() >>>> sock_set_flag(DONE) >>>> bh_unlock_sock() >>>> sock_set_flag(DEAD) >>>> release_sock() >>> Hi wenjia, >>> >>> I think I know what D. Wythe means now, and I think he is right on this. >>> >>> IIUC, in process context, lock_sock() won't respect bh_lock_sock() if it >>> acquires the lock before bh_lock_sock(). This is how the sock lock works. >>> >>> PROCESS CONTEXT INTERRUPT CONTEXT >>> ------------------------------------------------------------------------ >>> lock_sock() >>> spin_lock_bh(&sk->sk_lock.slock); >>> ... >>> sk->sk_lock.owned = 1; >>> // here the spinlock is released >>> spin_unlock_bh(&sk->sk_lock.slock); >>> __smc_release() >>> bh_lock_sock(&smc->sk); >>> smc_cdc_msg_recv_action(smc, cdc); >>> sock_set_flag(&smc->sk, SOCK_DONE); >>> bh_unlock_sock(&smc->sk); >>> >>> sock_set_flag(DEAD) <-- Can be before or after sock_set_flag(DONE) >>> release_sock() >>> >>> The bh_lock_sock() only spins on sk->sk_lock.slock, which is already released >>> after lock_sock() return. Therefor, there is actually no lock between >>> the code after lock_sock() and before release_sock() with bh_lock_sock()...bh_unlock_sock(). >>> Thus, sock_set_flag(DEAD) won't respect bh_lock_sock() at all, and might be >>> before or after sock_set_flag(DONE). >>> >>> >>> Actually, in TCP, the interrupt context will check sock_owned_by_user(). >>> If it returns true, the softirq just defer the process to backlog, and process >>> that in release_sock(). Which avoid the race between softirq and process >>> when visiting the 'struct sock'. >>> >>> tcp_v4_rcv() >>> bh_lock_sock_nested(sk); >>> tcp_segs_in(tcp_sk(sk), skb); >>> ret = 0; >>> if (!sock_owned_by_user(sk)) { >>> ret = tcp_v4_do_rcv(sk, skb); >>> } else { >>> if (tcp_add_backlog(sk, skb, &drop_reason)) >>> goto discard_and_relse; >>> } >>> bh_unlock_sock(sk); >>> >>> >>> But in SMC we don't have a backlog, that means fields in 'struct sock' >>> might all have race, and this sock_set_flag() is just one of the cases. >>> >>> Best regards, >>> Dust >>> >> I agree on your description above. >> Sure, the following case 1) can also happen >> >> case 1) >> ------- >> lock_sock() >> __smc_release >> >> sock_set_flag(DEAD) >> bh_lock_sock() >> smc_cdc_msg_recv_action() >> sock_set_flag(DONE) >> bh_unlock_sock() >> release_sock() >> >> case 2) >> ------- >> lock_sock() >> __smc_release >> >> bh_lock_sock() >> smc_cdc_msg_recv_action() >> sock_set_flag(DONE) sock_set_flag(DEAD) >> __set_bit __set_bit >> bh_unlock_sock() >> release_sock() >> >> My point here is that case2) can never happen. i.e that sock_set_flag(DONE) >> and sock_set_flag(DEAD) can not happen concurrently. Thus, how could >> the atomic set help make sure that the Dead flag would not be overwritten >> with DONE? > I agree with you on this. I also don't see using atomic can > solve the problem of overwriting the DEAD flag with DONE. > > I think we need some mechanisms to ensure that sk_flags and other > struct sock related fields are not modified simultaneously. > > Best regards, > Dust It seems that everyone has agrees on that case 2 is impossible. I'm a bit confused, why that sock_set_flag(DONE) and sock_set_flag(DEAD) can not happen concurrently. What mechanism prevents their parallel execution? Best wishes, D. Wythe >
Powered by blists - more mailing lists