linux-kernel - Re: [PATCH AUTOSEL 5.2 039/249] signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAH2r5mtB=KO+9fxSYQHbjD+0K+5rGL6Q8TSU0_wsHUdqHy1rSw@mail.gmail.com>
Date:   Tue, 23 Jul 2019 21:02:18 -0500
From:   Steve French <smfrench@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     ronnie sahlberg <ronniesahlberg@...il.com>,
        Sasha Levin <sashal@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stable <stable@...r.kernel.org>,
        Namjae Jeon <namjae.jeon@...sung.com>,
        Jeff Layton <jlayton@...marydata.com>,
        linux-cifs <linux-cifs@...r.kernel.org>
Subject: Re: [PATCH AUTOSEL 5.2 039/249] signal/cifs: Fix cifs_put_tcp_session
 to call send_sig instead of force_sig

On Tue, Jul 23, 2019 at 8:32 PM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> Steve French <smfrench@...il.com> writes:
>
> > Very easy to see what caused the regression with this global change:
> >
> > mount (which launches "cifsd" thread to read the socket)
> > umount (which kills the "cifsd" thread)
> > rmmod   (rmmod now fails since "cifsd" thread is still active)
> >
> > mount launches a thread to read from the socket ("cifsd")
> > umount is supposed to kill that thread (but with the patch
> > "signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of
> > force_sig" that no longer works).  So the regression is that after
> > unmount you still see the "cifsd" thread, and the reason that cifsd
> > thread is still around is that that patch no longer force kills the
> > process (see line 2652 of fs/cifs/connect.c) which regresses module
> > removal.
> >
> > -               force_sig(SIGKILL, task);
> > +               send_sig(SIGKILL, task, 1);
> >
> > The comment in the changeset indicates "The signal SIGKILL can not be
> > ignored" but obviously it can be ignored - at least on 5.3-rc1 it is
> > being ignored.
> >
> > If send_sig(SIGKILL ...) doesn't work and if force_sig(SIGKILL, task)
> > is removed and no longer possible - how do we kill a helper process
> > ...
>
> I think I see what is happening.  It looks like as well as misuinsg
> force_sig, cifs is also violating the invariant that keeps SIGKILL out
> of the blocked signal set.
>
> For that force_sig will act differently.  I did not consider it because
> that is never supposed to happen.
>
> Can someone test this code below and confirm the issue goes away?
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 5d6d44bfe10a..2a782ebc7b65 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -347,6 +347,7 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
>          */
>
>         sigfillset(&mask);
> +       sigdelset(&mask, SIGKILL);
>         sigprocmask(SIG_BLOCK, &mask, &oldmask);
>
>         /* Generate a rfc1002 marker for SMB2+ */
>
>
> Eric

I just tried your suggestion and it didn't work.   I also tried doing
a similar thing on the thread we are trying to kills ("cifsd" - ie
which is blocked in the function cifs_demultiplex_thread waiting to
read from the socket)
# git diff -a
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a4830ced0f98..b73062520a17 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1104,6 +1104,7 @@ cifs_demultiplex_thread(void *p)
        struct task_struct *task_to_wake = NULL;
        struct mid_q_entry *mids[MAX_COMPOUND];
        char *bufs[MAX_COMPOUND];
+       sigset_t mask;

        current->flags |= PF_MEMALLOC;
        cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
@@ -1113,6 +1114,8 @@ cifs_demultiplex_thread(void *p)
                mempool_resize(cifs_req_poolp, length + cifs_min_rcv);

        set_freezable();
+       sigfillset(&mask);
+       sigdelset(&mask, SIGKILL);
        while (server->tcpStatus != CifsExiting) {
                if (try_to_freeze())
                        continue;


That also didn't work.     The only thing I have been able to find
which worked was:

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a4830ced0f98..e74f04163fc9 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1113,6 +1113,7 @@ cifs_demultiplex_thread(void *p)
                mempool_resize(cifs_req_poolp, length + cifs_min_rcv);

        set_freezable();
+      allow_signal(SIGKILL);
        while (server->tcpStatus != CifsExiting) {
                if (try_to_freeze())
                        continue;


That fixes the problem ... but ... as Ronnie and others have noted it
would allow a userspace process to make the mount unusable (all you
would have to do would be to do a kill -9 of the "cifsd" process from
some userspace process like bash and the mount would be unusable - so
this sounds dangerous.

Is there an alternative that, in the process doing the unmount in
kernel, would allow us to do the equivalent of:
      "allow_signal(SIGKILL, <the id of the cifsd process>"
In otherwords, to minimize the risk of some userspace process killing
cifsd, could we delay enabling allow_signal(SIGKILL) till the unmount
begins by doing it for a different process (have the unmount process
enable signals for the cifsd process).   Otherwise is there a way to
force kill a process from the kernel as we used to do - without
running the risk of a user space process killing cifsd (which is bad).

-- 
Thanks,

Steve