lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Oct 2021 10:02:20 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Nitesh Lal <nilal@...hat.com>,
        Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Alex Belits <abelits@...its.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [patch v4 1/8] add basic task isolation prctl interface

<snip>

> What are the requirements of the signal exactly (and why it is popular) ?
> Because the interruption event can be due to:
> 
> * An IPI.
> * A system call.

IRQs (easy to trace), exceptions.

> In the "full task isolation mode" patchset (the one from Alex), a system call
> will automatically generate a SIGKILL once a system call is performed
> (after the prctl to enable task isolated mode, but
> before the prctl to disable task isolated mode).
> This can be implemented, if desired, by SECCOMP syscall blocking
> (which already exists).
> 
> For other interruptions, which happen through IPIs, one can print
> the stack trace of the program (or interrupt) that generated
> the IPI to find out the cause (which is what rt-trace-bpf.py is doing).
> 
> An alternative would be to add tracepoints so that one can
> find out which function in the kernel caused the CPU and
> task to become "a target for interruptions".

For example, adding a tracepoint to mark_vmstat_dirty() function
(allowing to see how that function was invoked on a given CPU, and
by whom) appears to be sufficient information to debug problems.

(mark_vmstat_dirty() from
[patch v4 5/8] task isolation: sync vmstats conditional on changes)

Instead of a coredump image with a SIGKILL sent at that point.

Looking at

https://github.com/abelits/libtmc

One can see the notification via SIGUSR1 being used.

To support something similar to it, one would add a new bit to 
flags field of:

+struct task_isol_activate_control {
+       __u64 flags;
+       __u64 quiesce_oneshot_mask;
+       __u64 pad[6];
+};

Remove 

+ 	       ret = -EINVAL;
+               if (act_ctrl.flags)
+                       goto out;

>From the handler, shrink the padded space and use it.

> 
> > > > Also, see:
> > > > 
> > > >   https://lkml.kernel.org/r/20210929152429.186930629@infradead.org
> > > 
> > > As you can see from the below pseudocode, we were thinking of queueing
> > > the (invalidate icache or TLB flush) in case app is in userspace,
> > > to perform on return to kernel space, but the approach in your patch might be
> > > superior (will take sometime to parse that thread...).
> > 
> > Let me assume you're talking about kernel TLB invalidates, otherwise it
> > would be terribly broken.
> > 
> > > > Suppose:
> > > > 
> > > > 	CPU0					CPU1
> > > > 
> > > > 	sys_prctl()
> > > > 	<kernel entry>
> > > > 	  // marks task 'important'
> > > > 						text_poke_sync()
> > > > 						  // checks CPU0, not userspace, queues IPI
> > > > 	<kernel exit>
> > > > 
> > > > 	$important userspace			  arch_send_call_function_ipi_mask()
> > > > 	<IPI>
> > > > 	  // finds task is 'important' and
> > > > 	  // can't take interrupts
> > > > 	  sigkill()
> > > > 
> > > > *Whoopsie*
> > > > 
> > > > 
> > > > Fundamentally CPU1 can't elide the IPI until CPU0 is in userspace,
> > > > therefore CPU0 can't wait for quescence in kernelspace, but if it goes
> > > > to userspace, it'll get killed on interruption. Catch-22.

To reiterate on this point:

> > > >         CPU0                                    CPU1
> > > > 
> > > >         sys_prctl()
> > > >         <kernel entry>
> > > >           // marks task 'important'
> > > >                                                 text_poke_sync()
> > > >                                                   // checks CPU0, not userspace, queues IPI
> > > >         <kernel exit>

1) Such races can be fixed by proper uses of atomic variables.

2) If a signal to an application is desired, fail to see why this
interface (ignoring bugs related to the particular mechanism) does not
allow it.

So hopefully this addresses your comments.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ