lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACBanvqp52q0c6SxsSFqZwfn45Knt-CYgg-AcOJrtaQ-Lim+YQ@mail.gmail.com>
Date:	Tue, 26 Jun 2012 09:39:14 -0700
From:	Mandeep Singh Baines <msb@...omium.org>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Tejun Heo <tj@...nel.org>, Arnd Bergmann <arnd@...db.de>,
	Mark Brown <broonie@...nsource.wolfsonmicro.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>, linux-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	Doug Anderson <dianders@...omium.org>
Subject: Re: try_to_freeze() called with IRQs disabled on ARM

On Thu, Aug 25, 2011 at 7:55 AM, Russell King - ARM Linux
<linux@....linux.org.uk> wrote:
> On Thu, Aug 25, 2011 at 03:09:07PM +0200, Tejun Heo wrote:
>> Hey, Russell.
>>
>> If you can fix it properly without going through temporary step,
>> that's awesome.  Let's put the arguments behind, okay?
>
> Here's the patch.  As the kernel I've run this against doesn't have the
> change to try_to_freeze(), I added a might_sleep() in do_signal() during
> my testing to verify that it fixes Mark's problem (which it does.)
>
> I've tested functions returning -ERESTARTSYS, -ERESTARTNOHAND and
> -ERESTART_RESTARTBLOCK, all of which seem to behave as expected with
> signals such as SIGCONT (without handler) and SIGALRM (with handler).
> I haven't tested -ERESTARTNOINTR.
>
> I don't have a test case for the race condition I mentioned (which is
> admittedly pretty difficult to construct, requiring an explicit
> signal, schedule, signal sequence) but this should plug that too.
>
> How do we achieve this?  Effectively the steps in this patch are:
>
> 1. Undo Arnd's fixups to the syscall restart processing (but don't worry,
>   we restore it in step 3).
>
> 2. Introduce TIF_SYS_RESTART, which is set when we enter signal handling
>   and the syscall has returned one of the restart codes.  This is used
>   as a flag to indicate that we have some syscall restart processing to
>   do at some point.
>
> 3. Clear TIF_SYS_RESTART whenever ptrace is used to set the GP registers
>   (thereby restoring Arnd's fixup for his gdb testsuite problem - it
>   would be good if Arnd could reconfirm that.)
>
> 4. When we setup a user handler to run, check TIF_SYS_RESTART and clear it.
>   If it was set, we need to set things up to return -EINTR or restart the
>   syscall as appropriate.  As we've cleared it, no further restart
>   processing will occur.
>
> 5. Once we've run all work (signal delivery, and rescheduling events), and
>   we're about to return to userspace, make a final check for TIF_SYS_RESTART.
>   If it's still set, then we're returning to userspace having not setup
>   any user handlers, and we need to restart the syscall.  This is mostly
>   trivial, except for OABI restartblock which requires the user stack to
>   be written.  We have to re-enable IRQs for this write, which means we
>   have to manually re-check for rescheduling events, abort the restart,
>   and try again later.
>
> One of the side effects of reverting Arnd's patch is that we restore the
> strace behaviour which we've had for years on ARM, and can still be seen
> on x86: strace can see the -ERESTART return codes from the kernel syscalls,
> rather than what seems to be the signal number:
>
> Before:
> rt_sigsuspend([] <unfinished ...>
> --- SIGIO (I/O possible) ---
> <... rt_sigsuspend resumed> )           = 29
> sigreturn()                             = ? (mask now [])
>
> vs:
> rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted)
> --- SIGIO (I/O possible) @ 0 (0) ---
> sigreturn()                             = ? (mask now [])
>
> x86:
> rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted)
> --- {si_signo=SIGIO, si_code=SI_USER} (I/O possible) ---
> sigreturn()                             = ? (mask now [])
>
> So, this patch should fix:
> 1. The race which I identified in the signal handling code (I think x86
>   and other architectures can suffer from it too.)
> 2. The warning from try_to_freeze.
> 3. The unanticipated change to strace output.
>
> Arnd, can you test this to make sure your gdb test case still works, and
> Mark, can you test this to make sure it fixes your problem please?
>
> Thanks.
>

Hi Russell,

We are seeing this on ARM with a 3.4 kernel. How was it eventually
resolved? This patch isn't in any tree and I can't find any patch in
linus/master or in any ARM tree that looks like it fixes this.

Here is a trace:

[  968.040000] BUG: sleeping function called from invalid context at
include/linux/freezer.h:46
[  968.050000] in_atomic(): 0, irqs_disabled(): 128, pid: 1, name: init
[  968.060000] [<8001550c>] (unwind_backtrace+0x0/0xec) from
[<805050c4>] (dump_stack+0x20/0x24)
[  968.060000] [<805050c4>] (dump_stack+0x20/0x24) from [<80052db8>]
(__might_sleep+0xec/0x10c)
[  968.070000] [<80052db8>] (__might_sleep+0xec/0x10c) from
[<80011290>] (do_signal+0x94/0x54c)
[  968.080000] [<80011290>] (do_signal+0x94/0x54c) from [<80011c24>]
(do_notify_resume+0x28/0x60)
[  968.090000] [<80011c24>] (do_notify_resume+0x28/0x60) from
[<8000e018>] (work_pending+0x24/0x28)

Regards,
Mandeep

>  arch/arm/include/asm/thread_info.h |    3 +
>  arch/arm/kernel/entry-common.S     |   11 ++
>  arch/arm/kernel/ptrace.c           |    2 +
>  arch/arm/kernel/signal.c           |  209 ++++++++++++++++++++++++------------
>  4 files changed, 155 insertions(+), 70 deletions(-)
>
> diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
> index 7b5cc8d..40df533 100644
> --- a/arch/arm/include/asm/thread_info.h
> +++ b/arch/arm/include/asm/thread_info.h
> @@ -129,6 +129,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
>  /*
>  * thread information flags:
>  *  TIF_SYSCALL_TRACE  - syscall trace active
> + *  TIF_SYS_RESTART    - syscall restart processing
>  *  TIF_SIGPENDING     - signal pending
>  *  TIF_NEED_RESCHED   - rescheduling necessary
>  *  TIF_NOTIFY_RESUME  - callback before returning to user
> @@ -139,6 +140,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
>  #define TIF_NEED_RESCHED       1
>  #define TIF_NOTIFY_RESUME      2       /* callback before returning to user */
>  #define TIF_SYSCALL_TRACE      8
> +#define TIF_SYS_RESTART                9
>  #define TIF_POLLING_NRFLAG     16
>  #define TIF_USING_IWMMXT       17
>  #define TIF_MEMDIE             18      /* is terminating due to OOM killer */
> @@ -147,6 +149,7 @@ extern void vfp_flush_hwstate(struct thread_info *);
>  #define TIF_SECCOMP            21
>
>  #define _TIF_SIGPENDING                (1 << TIF_SIGPENDING)
> +#define _TIF_SYS_RESTART       (1 << TIF_SYS_RESTART)
>  #define _TIF_NEED_RESCHED      (1 << TIF_NEED_RESCHED)
>  #define _TIF_NOTIFY_RESUME     (1 << TIF_NOTIFY_RESUME)
>  #define _TIF_SYSCALL_TRACE     (1 << TIF_SYSCALL_TRACE)
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index b2a27b6..e922b85 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -45,6 +45,7 @@ ret_fast_syscall:
>  fast_work_pending:
>        str     r0, [sp, #S_R0+S_OFF]!          @ returned r0
>  work_pending:
> +       enable_irq
>        tst     r1, #_TIF_NEED_RESCHED
>        bne     work_resched
>        tst     r1, #_TIF_SIGPENDING|_TIF_NOTIFY_RESUME
> @@ -56,6 +57,13 @@ work_pending:
>        bl      do_notify_resume
>        b       ret_slow_syscall                @ Check work again
>
> +work_syscall_restart:
> +       mov     r0, sp                          @ 'regs'
> +       bl      syscall_restart                 @ process system call restart
> +       teq     r0, #0                          @ if ret=0 -> success, so
> +       beq     ret_restart                     @ return to userspace directly
> +       b       ret_slow_syscall                @ otherwise, we have a segfault
> +
>  work_resched:
>        bl      schedule
>  /*
> @@ -69,6 +77,9 @@ ENTRY(ret_to_user_from_irq)
>        tst     r1, #_TIF_WORK_MASK
>        bne     work_pending
>  no_work_pending:
> +       tst     r1, #_TIF_SYS_RESTART
> +       bne     work_syscall_restart
> +ret_restart:
>  #if defined(CONFIG_IRQSOFF_TRACER)
>        asm_trace_hardirqs_on
>  #endif
> diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
> index 2491f3b..ac8c34e 100644
> --- a/arch/arm/kernel/ptrace.c
> +++ b/arch/arm/kernel/ptrace.c
> @@ -177,6 +177,7 @@ put_user_reg(struct task_struct *task, int offset, long data)
>
>        if (valid_user_regs(&newregs)) {
>                regs->uregs[offset] = data;
> +               clear_ti_thread_flag(task_thread_info(task), TIF_SYS_RESTART);
>                ret = 0;
>        }
>
> @@ -604,6 +605,7 @@ static int gpr_set(struct task_struct *target,
>                return -EINVAL;
>
>        *task_pt_regs(target) = newregs;
> +       clear_ti_thread_flag(task_thread_info(target), TIF_SYS_RESTART);
>        return 0;
>  }
>
> diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
> index 0340224..42a1521 100644
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -649,6 +649,135 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
>  }
>
>  /*
> + * Syscall restarting codes
> + *
> + * -ERESTARTSYS: restart system call if no handler, or if there is a
> + *     handler but it's marked SA_RESTART.  Otherwise return -EINTR.
> + * -ERESTARTNOINTR: always restart system call
> + * -ERESTARTNOHAND: restart system call only if no handler, otherwise
> + *     return -EINTR if invoking a user signal handler.
> + * -ERESTART_RESTARTBLOCK: call restart syscall if no handler, otherwise
> + *     return -EINTR if invoking a user signal handler.
> + */
> +static void setup_syscall_restart(struct pt_regs *regs)
> +{
> +       regs->ARM_r0 = regs->ARM_ORIG_r0;
> +       regs->ARM_pc -= thumb_mode(regs) ? 2 : 4;
> +}
> +
> +/*
> + * Depending on the signal settings we may need to revert the decision
> + * to restart the system call.  But skip this if a debugger has chosen
> + * to restart at a different PC.
> + */
> +static void syscall_restart_handler(struct pt_regs *regs, struct k_sigaction *ka)
> +{
> +       if (test_and_clear_thread_flag(TIF_SYS_RESTART)) {
> +               long r0 = regs->ARM_r0;
> +
> +               /*
> +                * By default, return -EINTR to the user process for any
> +                * syscall which would otherwise be restarted.
> +                */
> +               regs->ARM_r0 = -EINTR;
> +
> +               if (r0 == -ERESTARTNOINTR ||
> +                   (r0 == -ERESTARTSYS && !(ka->sa.sa_flags & SA_RESTART)))
> +                       setup_syscall_restart(regs);
> +       }
> +}
> +
> +/*
> + * Handle syscall restarting when there is no user handler in place for
> + * a delivered signal.  Rather than doing this as part of the normal
> + * signal processing, we do this on the final return to userspace, after
> + * we've finished handling signals and checking for schedule events.
> + *
> + * This avoids bad behaviour such as:
> + *  - syscall returns -ERESTARTNOHAND
> + *  - signal with no handler (so we set things up to restart the syscall)
> + *  - schedule
> + *  - signal with handler (eg, SIGALRM)
> + *  - we call the handler and then restart the syscall
> + *
> + * In order to avoid races with TIF_NEED_RESCHED, IRQs must be disabled
> + * when this function is called and remain disabled until we exit to
> + * userspace.
> + */
> +asmlinkage int syscall_restart(struct pt_regs *regs)
> +{
> +       struct thread_info *thread = current_thread_info();
> +
> +       clear_ti_thread_flag(thread, TIF_SYS_RESTART);
> +
> +       /*
> +        * Restart the system call.  We haven't setup a signal handler
> +        * to invoke, and the regset hasn't been usurped by ptrace.
> +        */
> +       if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) {
> +               if (thumb_mode(regs)) {
> +                       regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
> +                       regs->ARM_pc -= 2;
> +               } else {
> +#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
> +                       regs->ARM_r7 = __NR_restart_syscall;
> +                       regs->ARM_pc -= 4;
> +#else
> +                       u32 sp = regs->ARM_sp - 4;
> +                       u32 __user *usp = (u32 __user *)sp;
> +                       int ret;
> +
> +                       /*
> +                        * For OABI, we need to play some extra games, because
> +                        * we need to write to the users stack, which we can't
> +                        * do reliably from IRQs-disabled context.  Temporarily
> +                        * re-enable IRQs, perform the store, and then plug
> +                        * the resulting race afterwards.
> +                        */
> +                       local_irq_enable();
> +                       ret = put_user(regs->ARM_pc, usp);
> +                       local_irq_disable();
> +
> +                       /*
> +                        * Plug the reschedule race - if we need to reschedule,
> +                        * abort the syscall restarting.  We haven't modified
> +                        * anything other than the attempted write to the stack
> +                        * so we can merely retry later.
> +                        */
> +                       if (need_resched()) {
> +                               set_ti_thread_flag(thread, TIF_SYS_RESTART);
> +                               return -EINTR;
> +                       }
> +
> +                       /*
> +                        * We failed (for some reason) to write to the stack.
> +                        * Terminate the task.
> +                        */
> +                       if (ret) {
> +                               force_sigsegv(0, current);
> +                               return -EFAULT;
> +                       }
> +
> +                       /*
> +                        * Success, update the stack pointer and point the
> +                        * PC at the restarting code.
> +                        */
> +                       regs->ARM_sp = sp;
> +                       regs->ARM_pc = KERN_RESTART_CODE;
> +#endif
> +               }
> +       } else {
> +               /*
> +                * Simple restart - just back up and re-execute the last
> +                * instruction.
> +                */
> +               setup_syscall_restart(regs);
> +       }
> +
> +       return 0;
> +}
> +
> +/*
>  * Note that 'init' is a special process: it doesn't get signals it doesn't
>  * want to handle. Thus you cannot kill init even with a SIGKILL even by
>  * mistake.
> @@ -659,7 +788,6 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
>  */
>  static void do_signal(struct pt_regs *regs, int syscall)
>  {
> -       unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
>        struct k_sigaction ka;
>        siginfo_t info;
>        int signr;
> @@ -674,32 +802,16 @@ static void do_signal(struct pt_regs *regs, int syscall)
>                return;
>
>        /*
> -        * If we were from a system call, check for system call restarting...
> +        * Set the SYS_RESTART flag to indicate that we have some
> +        * cleanup of the restart state to perform when returning to
> +        * userspace.
>         */
> -       if (syscall) {
> -               continue_addr = regs->ARM_pc;
> -               restart_addr = continue_addr - (thumb_mode(regs) ? 2 : 4);
> -               retval = regs->ARM_r0;
> -
> -               /*
> -                * Prepare for system call restart.  We do this here so that a
> -                * debugger will see the already changed PSW.
> -                */
> -               switch (retval) {
> -               case -ERESTARTNOHAND:
> -               case -ERESTARTSYS:
> -               case -ERESTARTNOINTR:
> -                       regs->ARM_r0 = regs->ARM_ORIG_r0;
> -                       regs->ARM_pc = restart_addr;
> -                       break;
> -               case -ERESTART_RESTARTBLOCK:
> -                       regs->ARM_r0 = -EINTR;
> -                       break;
> -               }
> -       }
> -
> -       if (try_to_freeze())
> -               goto no_signal;
> +       if (syscall &&
> +           (regs->ARM_r0 == -ERESTARTSYS ||
> +            regs->ARM_r0 == -ERESTARTNOINTR ||
> +            regs->ARM_r0 == -ERESTARTNOHAND ||
> +            regs->ARM_r0 == -ERESTART_RESTARTBLOCK))
> +               set_thread_flag(TIF_SYS_RESTART);
>
>        /*
>         * Get the signal to deliver.  When running under ptrace, at this
> @@ -709,19 +821,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
>        if (signr > 0) {
>                sigset_t *oldset;
>
> -               /*
> -                * Depending on the signal settings we may need to revert the
> -                * decision to restart the system call.  But skip this if a
> -                * debugger has chosen to restart at a different PC.
> -                */
> -               if (regs->ARM_pc == restart_addr) {
> -                       if (retval == -ERESTARTNOHAND
> -                           || (retval == -ERESTARTSYS
> -                               && !(ka.sa.sa_flags & SA_RESTART))) {
> -                               regs->ARM_r0 = -EINTR;
> -                               regs->ARM_pc = continue_addr;
> -                       }
> -               }
> +               syscall_restart_handler(regs, &ka);
>
>                if (test_thread_flag(TIF_RESTORE_SIGMASK))
>                        oldset = &current->saved_sigmask;
> @@ -740,38 +840,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
>                return;
>        }
>
> - no_signal:
>        if (syscall) {
> -               /*
> -                * Handle restarting a different system call.  As above,
> -                * if a debugger has chosen to restart at a different PC,
> -                * ignore the restart.
> -                */
> -               if (retval == -ERESTART_RESTARTBLOCK
> -                   && regs->ARM_pc == continue_addr) {
> -                       if (thumb_mode(regs)) {
> -                               regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
> -                               regs->ARM_pc -= 2;
> -                       } else {
> -#if defined(CONFIG_AEABI) && !defined(CONFIG_OABI_COMPAT)
> -                               regs->ARM_r7 = __NR_restart_syscall;
> -                               regs->ARM_pc -= 4;
> -#else
> -                               u32 __user *usp;
> -
> -                               regs->ARM_sp -= 4;
> -                               usp = (u32 __user *)regs->ARM_sp;
> -
> -                               if (put_user(regs->ARM_pc, usp) == 0) {
> -                                       regs->ARM_pc = KERN_RESTART_CODE;
> -                               } else {
> -                                       regs->ARM_sp += 4;
> -                                       force_sigsegv(0, current);
> -                               }
> -#endif
> -                       }
> -               }
> -
>                /* If there's no signal to deliver, we just put the saved sigmask
>                 * back.
>                 */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ