netdev - Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e54b3c5e0d4c91eb64f2dd1583dd687bc34757e.camel@marvell.com>
Date:   Sun, 4 Oct 2020 14:44:39 +0000
From:   Alex Belits <abelits@...vell.com>
To:     "frederic@...nel.org" <frederic@...nel.org>
CC:     "mingo@...nel.org" <mingo@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "will@...nel.org" <will@...nel.org>,
        Prasun Kapoor <pkapoor@...vell.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard
 isolation from kernel

On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> External Email
> 
> -------------------------------------------------------------------
> ---
> On Wed, Jul 22, 2020 at 02:49:49PM +0000, Alex Belits wrote:
> > +/*
> > + * Description of the last two tasks that ran isolated on a given
> > CPU.
> > + * This is intended only for messages about isolation breaking. We
> > + * don't want any references to actual task while accessing this
> > from
> > + * CPU that caused isolation breaking -- we know nothing about
> > timing
> > + * and don't want to use locking or RCU.
> > + */
> > +struct isol_task_desc {
> > +	atomic_t curr_index;
> > +	atomic_t curr_index_wr;
> > +	bool	warned[2];
> > +	pid_t	pid[2];
> > +	pid_t	tgid[2];
> > +	char	comm[2][TASK_COMM_LEN];
> > +};
> > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
> 
> So that's quite a huge patch that would have needed to be split up.
> Especially this tracing engine.
> 
> Speaking of which, I agree with Thomas that it's unnecessary. It's
> too much
> code and complexity. We can use the existing trace events and perform
> the
> analysis from userspace to find the source of the disturbance.

The idea behind this is that isolation breaking events are supposed to
be known to the applications while applications run normally, and they
should not require any analysis or human intervention to be handled.

A process may exit isolation because some leftover delayed work, for
example, a timer or a workqueue, is still present on a CPU, or because
a page fault or some other exception, normally handled silently, is
caused by the task. It is also possible to direct an interrupt to a CPU
that is running an isolated task -- currently it's perfectly valid to
set interrupt smp affinity to a CPU running isolated task, and then
interrupt will cause breaking isolation. While it's probably not the
best way of handling interrupts, I would rather not prohibit this
explicitly.

There is also a matter of avoiding race conditions on entering
isolation. Once CPU entered isolation, other CPUs should avoid
disturbing it when they know that CPU is running a task in isolated
mode. However for a short time after entering isolation other CPUs may
be unaware of this, and will still send IPIs to it. Preventing this
scenario completely would be very costly in terms of what other CPUs
will have to do before notifying others, so similar to how EINTR works,
we can simply specify that this is allowed, and task is supposed to re-
enter isolation after this. It's still a bad idea to specify that
isolation breaking can continue happening while application is running
in isolated mode, however allowing some "grace period" after entering
is acceptable as long as application is aware of this happening.

In libtmc I have moved this handling of isolation breaking into a
separate thread, intended to become a separate daemon if necessary. In
part it was done because initial implementation of isolation made it
very difficult to avoid repeating delayed work on isolated CPUs, so
something had to watch for it from non-isolated CPU. It's possible that
now, when delayed work does not appear on isolated CPUs out of nowhere,
the need in isolation manager thread will disappear, and task itself
will be able to handle all isolation breaking, like original
implementation by Chris was supposed to.

However in either case it's still useful for the task, or isolation
manager, to get a description of the isolation-breaking event. This is
what those things are intended for. Now they only produce log messages
because this is where initially all description of isolation-breaking
events went, however I would prefer to make logging optional but always
let applications read those events descriptions, regardless of any
tracing mechanism being used. I was more focused on making the
reporting mechanism properly detect the cause of isolation breaking
because that functionality was not quite working in earlier work by
Chris and Yuri, so I have kept logging as the only output, but made it
suitable for producing events that applications will be able to
receive. Application, or isolation manager, will receive clear and
unambiguous reporting, so there will be no need for any additional
analysis or guesswork.

After adding a proper "low-level" isolation flags, I got the idea that
we might have a better yet reporting mechanism. Early isolation
breaking detection on kernel entry may set a flag that says that
isolation breaking happened, however its cause is unknown. Or, more
likely, only some general information about isolation breaking is
available, like a type of exception. Then, once a known isolation-
breaking reporting mechanism is called from interrupt, syscall, IPI or
exception processing, the flag is cleared, and reporting is supposed to
be done. However if then kernel returns to userspace on isolated task
but isolation breaking is not reported yet, an isolation breaking
reporting with "unknown cause" will happen. We may even add some more
optional lightweight tracing for debugging purposes, however the fact
that reporting will be done, will allow us to make sure that no matter
how complicated exception processing is, or how we managed to miss some
subtle details of an architecture where we are implementing task
isolation, there will be a reliable way to tell the user that something
is wrong, and tell the task that there is something it has to react to.


> 
> Thanks.
>