netdev - Re: [lvc-project] [PATCH] [RFC] net: smc: fix fasync leak in smc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6e62fdbf-a5d1-4650-b0b1-2a2698ed2040@linux.ibm.com>
Date: Thu, 7 Mar 2024 09:58:51 +0100
From: Jan Karcher <jaka@...ux.ibm.com>
To: Dmitry Antipov <dmantipov@...dex.ru>, Wen Gu <guwen@...ux.alibaba.com>,
        "wenjia@...ux.ibm.com" <wenjia@...ux.ibm.com>
Cc: "lvc-project@...uxtesting.org" <lvc-project@...uxtesting.org>,
        "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [lvc-project] [PATCH] [RFC] net: smc: fix fasync leak in
 smc_release()



On 06/03/2024 19:07, Dmitry Antipov wrote:
> 
> Well, the whole picture is somewhat more complicated. Consider the
> following diagram (an underlying kernel socket is in [], e.g. [smc->sk]):
> 
> Thread 0                        Thread 1
> 
> ioctl(sock, FIOASYNC, [1])
> ...
> sock = filp->private_data;
> lock_sock(sock [smc->sk]);
> sock_fasync(sock, ..., 1)       ; new fasync_struct linked to smc->sk
> release_sock(sock [smc->sk]);
>                                  ...
>                                  lock_sock([smc->sk]);
>                                  ...
>                                  smc_switch_to_fallback()
>                                  ...
>                                  smc->clcsock->file->private_data = 
> smc->clcsock;
>                                  ...
>                                  release_sock([smc->sk]);
> ioctl(sock, FIOASYNC, [0])
> ...
> sock = filp->private_data;
> lock_sock(sock [smc->clcsock]);
> sock_fasync(sock, ..., 0)       ; nothing to unlink from smc->clcsock
>                                  ; since fasync entry was linked to smc->sk
> release_sock(sock [smc->clcsock]);
>                                  ...
>                                  close(sock [smc->clcsock]);
>                                  __fput(...);
>                                  file->f_op->fasync(sock, [0])   ; 
> always failed -
>                                                                  ; 
> should use
>                                                                  ; 
> smc->sk instead
>                                  file->f_op->release()
>                                     ...
>                                     smc_restore_fallback_changes()
>                                     ...
>                                     file->private_data = smc->sk.sk_socket;

Thank you Dmitry for your detailed explanations.
It helped me a lot trying to understand the problem.
And I'm still in the progress of trying to understand it.
 From my naive point of view:
Would it help if we catch the ioctl and check if there is an active 
fallback and either stall or route the ioctl to the correct socket?
I've seen that there is an ioctl function handle in the proto_ops.
So on a very abstract level we could do the following:
1. Indicate an active Fallback at the start of the fallback to tcp.
2. Catch ioctls.
3. Check if there is an active fallback.
	NO: Pass the ioctl.
	YES: Wait for the fallback to complete and pass after.

If this blocks too much we can obviously do some finer checks there.
E.g.: Just check if the private data is already at attached to the socket.

Do you think this would be a suiteable solution?
I'm going to look into your proposal Dmitry to see how you solved the 
problem and to understand it better.

Thanks
- Jan

> 
> That is, smc_restore_fallback_changes() restores filp->private_data to
> smc->sk. If __fput() would have called file->f_op->release() _before_
> file->f_op->fasync(), the fix would be as simple as adding
> 
> smc->sk.sk_socket->wq.fasync_list = smc->clcsock->wq.fasync_list;
> 
> to smc_restore_fallback_changes(). But since file->f_op->fasync() is called
> before file->f_op->release(), the former always makes an attempt to 
> unlink fasync
> entry from smc->clcsock instead of smc->sk, thus introducing the memory 
> leak.
> 
> And an idea with shared wait queue was intended in attempt to eliminate
> this chicken-egg lookalike problem completely.
> 
> Dmitry
>