linux-kernel - Re: [PATCH v4 1/2] livepatch: send a fake signal to all blocking tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <nycvar.YFH.7.76.1711302252480.11505@cbobk.fhfr.pm>
Date:   Thu, 30 Nov 2017 22:53:32 +0100 (CET)
From:   Jiri Kosina <jikos@...nel.org>
To:     Miroslav Benes <mbenes@...e.cz>
cc:     jpoimboe@...hat.com, jeyu@...nel.org, pmladek@...e.com,
        lpechacek@...e.cz, pavel@....cz, live-patching@...r.kernel.org,
        linux-kernel@...r.kernel.org, Oleg Nesterov <oleg@...hat.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        linuxppc-dev@...ts.ozlabs.org, x86@...nel.org
Subject: Re: [PATCH v4 1/2] livepatch: send a fake signal to all blocking
 tasks

On Wed, 15 Nov 2017, Miroslav Benes wrote:

> Live patching consistency model is of LEAVE_PATCHED_SET and
> SWITCH_THREAD. This means that all tasks in the system have to be marked
> one by one as safe to call a new patched function. Safe means when a
> task is not (sleeping) in a set of patched functions. That is, no
> patched function is on the task's stack. Another clearly safe place is
> the boundary between kernel and userspace. The patching waits for all
> tasks to get outside of the patched set or to cross the boundary. The
> transition is completed afterwards.
> 
> The problem is that a task can block the transition for quite a long
> time, if not forever. It could sleep in a set of patched functions, for
> example.  Luckily we can force the task to leave the set by sending it a
> fake signal, that is a signal with no data in signal pending structures
> (no handler, no sign of proper signal delivered). Suspend/freezer use
> this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
> is woken up (if it has been sleeping in the kernel before) or kicked by
> rescheduling IPI (if it was running on other CPU). This causes the task
> to go to kernel/userspace boundary where the signal would be handled and
> the task would be marked as safe in terms of live patching.
> 
> There are tasks which are not affected by this technique though. The
> fake signal is not sent to kthreads. They should be handled differently.
> They can be woken up so they leave the patched set and their
> TIF_PATCH_PENDING can be cleared thanks to stack checking.
> 
> For the sake of completeness, if the task is in TASK_RUNNING state but
> not currently running on some CPU it doesn't get the IPI, but it would
> eventually handle the signal anyway. Second, if the task runs in the
> kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
> handled on return from the interrupt. It would be handled on return to
> the userspace in the future when the fake signal is sent again. Stack
> checking deals with these cases in a better way.
> 
> If the task was sleeping in a syscall it would be woken by our fake
> signal, it would check if TIF_SIGPENDING is set (by calling
> signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
> ERESTART* return values are restarted in case of the fake signal (see
> do_signal()). EINTR is propagated back to the userspace program. This
> could disturb the program, but...
> 
> * each process dealing with signals should react accordingly to EINTR
>   return values.
> * syscalls returning EINTR happen to be quite common situation in the
>   system even if no fake signal is sent.
> * freezer sends the fake signal and does not deal with EINTR anyhow.
>   Thus EINTR values are returned when the system is resumed.
> 
> The very safe marking is done in architectures' "entry" on syscall and
> interrupt/exception exit paths, and in a stack checking functions of
> livepatch.  TIF_PATCH_PENDING is cleared and the next
> recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
> call klp_update_patch_state() before do_signal(), so that
> recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
> immediately and thus prevent a double call of do_signal().
> 
> Note that the fake signal is not sent to stopped/traced tasks. Such task
> prevents the patching to finish till it continues again (is not traced
> anymore).
> 
> Last, sending the fake signal is not automatic. It is done only when
> admin requests it by writing 1 to signal sysfs attribute in livepatch
> sysfs directory.
> 
> Signed-off-by: Miroslav Benes <mbenes@...e.cz>
> Cc: Oleg Nesterov <oleg@...hat.com>
> Cc: Michael Ellerman <mpe@...erman.id.au>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Andy Lutomirski <luto@...nel.org>
> Cc: linuxppc-dev@...ts.ozlabs.org
> Cc: x86@...nel.org
> ---
>  Documentation/ABI/testing/sysfs-kernel-livepatch | 12 +++++++
>  Documentation/livepatch/livepatch.txt            | 11 +++++--
>  arch/powerpc/kernel/signal.c                     |  6 ++--
>  arch/x86/entry/common.c                          |  6 ++--
>  kernel/livepatch/core.c                          | 30 +++++++++++++++++
>  kernel/livepatch/transition.c                    | 41 ++++++++++++++++++++++++
>  kernel/livepatch/transition.h                    |  1 +
>  kernel/signal.c                                  |  4 ++-

I'd like to be queuing this patchset for the next merge window, so if 
there are any objections for the out-of-kernel/livepatch/* changes, please 
speak up now.

Thanks.

-- 
Jiri Kosina
SUSE Labs