[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202103211108.628E6729F@keescook>
Date: Sun, 21 Mar 2021 11:28:17 -0700
From: Kees Cook <keescook@...omium.org>
To: John Wood <john.wood@....com>
Cc: Jann Horn <jannh@...gle.com>, Randy Dunlap <rdunlap@...radead.org>,
Jonathan Corbet <corbet@....net>,
James Morris <jmorris@...ei.org>,
Shuah Khan <shuah@...nel.org>,
"Serge E. Hallyn" <serge@...lyn.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andi Kleen <ak@...ux.intel.com>,
kernel test robot <oliver.sang@...el.com>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org,
linux-kselftest@...r.kernel.org,
kernel-hardening@...ts.openwall.com
Subject: Re: [PATCH v6 3/8] securtiy/brute: Detect a brute force attack
On Sat, Mar 20, 2021 at 04:34:06PM +0100, John Wood wrote:
> On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > > @@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void)
> > > {
> > > struct brute_stats *stats;
> > >
> > > - stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> > > + stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
> >
> > Why change this here? I'd just start with this in the patch that
> > introduces it.
>
> To be coherent in the previous patch. In the previous patch the kmalloc
> could use GFP_KERNEL due to the call was made out of an atomic context.
> Now, with the new lock it needs GFP_ATOMIC. So the question:
>
> If finally it need to use GFP_ATOMIC, the first patch need to use it even if
> it is not necessary?
It's probably not a big deal, but for me, I'd just do GFP_ATOMIC from
the start, maybe add a comment that says "some LSM hooks are from atomic
context" or something.
> > > if (!stats)
> > > return NULL;
> > >
> > > @@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void)
> > > * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > * since the task_free hook can be called from an IRQ context during the
> > > * execution of the task_alloc hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > */
> > > static void brute_share_stats(struct brute_stats *src,
> > > struct brute_stats **dst)
> > > {
> > > - unsigned long flags;
> > > -
> > > - spin_lock_irqsave(&src->lock, flags);
> > > + spin_lock(&src->lock);
> > > refcount_inc(&src->refc);
> > > *dst = src;
> > > - spin_unlock_irqrestore(&src->lock, flags);
> > > + spin_unlock(&src->lock);
> > > }
> >
> > I still don't think any locking is needed here; the whole function can
> > go away, IMO.
>
> In this case I think this is possible:
>
> Scenario 1: cpu 1 writes the stats pointer and cpu 2 is navigating the
> processes tree reading the same stats pointer.
>
> Scenario 2: cpu 1 is navigating the processes tree reading the stats
> pointer and in IRQ the same stats pointer is wrote.
>
> So, we need locking. Am I wrong?
But only the refcount is being incremented, yes? That doesn't need a
lock because it's already an atomic.
>
> > > /**
> > > @@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src,
> > > * this task and the new one being allocated. Otherwise, share the statistics
> > > * that the current task already has.
> > > *
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the task_alloc hook.
> > > + *
> > > * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> > > * otherwise.
> > > */
> > > static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > > {
> > > struct brute_stats **stats, **p_stats;
> > > + unsigned long flags;
> > >
> > > stats = brute_stats_ptr(task);
> > > p_stats = brute_stats_ptr(current);
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > >
> > > if (likely(*p_stats)) {
> > > brute_share_stats(*p_stats, stats);
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return 0;
> > > }
> > >
> > > *stats = brute_new_stats();
> > > - if (!*stats)
> > > + if (!*stats) {
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return -ENOMEM;
> > > + }
> > >
> > > brute_share_stats(*stats, p_stats);
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return 0;
> > > }
> >
> > I'd much prefer that whatever locking is needed be introduced in the
> > initial patch: this transformation just double the work to review. :)
>
> So, IIUC I need to introduce all the locks in the initial patch even if
> they are not necessary. Am I right?
I would find it easier to follow. Perhaps other reviewers would have a
different opinion.
>
> > >
> > > @@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > > * only one task (the task that calls the execve function) points to the data.
> > > * In this case, the previous allocation is used but the statistics are reset.
> > > *
> > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > - * since the task_free hook can be called from an IRQ context during the
> > > - * execution of the bprm_committing_creds hook.
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the bprm_committing_creds hook.
> > > */
> > > static void brute_task_execve(struct linux_binprm *bprm)
> > > {
> > > @@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > > unsigned long flags;
> > >
> > > stats = brute_stats_ptr(current);
> > > - if (WARN(!*stats, "No statistical data\n"))
> > > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > +
> > > + if (WARN(!*stats, "No statistical data\n")) {
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > + }
> > >
> > > - spin_lock_irqsave(&(*stats)->lock, flags);
> > > + spin_lock(&(*stats)->lock);
> > >
> > > if (!refcount_dec_not_one(&(*stats)->refc)) {
> > > /* execve call after an execve call */
> > > (*stats)->faults = 0;
> > > (*stats)->jiffies = get_jiffies_64();
> > > (*stats)->period = 0;
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > }
> > >
> > > /* execve call after a fork call */
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > +
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > *stats = brute_new_stats();
> > > WARN(!*stats, "Cannot allocate statistical data\n");
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > }
> >
> > Again, I don't see a need for locking -- this is just managing the
> > lifetime which is entirely handled by the implicit locking of "current"
> > and the refcount_t.
>
> Here I can see the same two scenarios noted before. So I think the locking
> is needed. Am I right?
>
> > > /**
> > > @@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > > * The statistical data that is shared between all the fork hierarchy processes
> > > * needs to be freed when this hierarchy disappears.
> > > *
> > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > - * since the task_free hook can be called from an IRQ context during the
> > > - * execution of the task_free hook.
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the task_free hook.
> > > */
> > > static void brute_task_free(struct task_struct *task)
> > > {
> > > @@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task)
> > > bool refc_is_zero;
> > >
> > > stats = brute_stats_ptr(task);
> > > - if (WARN(!*stats, "No statistical data\n"))
> > > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > +
> > > + if (WARN(!*stats, "No statistical data\n")) {
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > + }
> > >
> > > - spin_lock_irqsave(&(*stats)->lock, flags);
> > > + spin_lock(&(*stats)->lock);
> > > refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > >
> > > if (refc_is_zero) {
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > kfree(*stats);
> > > *stats = NULL;
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > + }
> > > +}
> >
> > Same; I would expect this to be simply:
>
> No comment. I think I am missing something. I need to clarify the previous
> cases before to work on the next ones. Sorry and thanks for the guidance.
Right -- so, there are a few concurrency cases you need to worry about,
AIUI:
1- stats lifetime (based on creation/death of tasks)
2- stats value being written vs read
3- stats values being written/read vs stats lifetime
Using refcount_t in the standard pattern (as you're doing) should
entirely cover "1".
Since the values read from stats are mostly independent, it should be
possible to use READ_ONCE() in the readers and WRITE_ONCE() under a lock
in the writers (this is case "2").
For "3", I think the implicit locking of "current" keeps you safe (as
in, the stats can't go away because "current" will always have a
reference on it).
I see two places where stats are written. One appears to be the
brute_task_execve() case where only 1 thread exists, so there's no
lock needed, and the other case is brute_update_crash_period(), which
makes sense to me to lock: two tasks might be sharing a stats as they
crash.
Of course, I could easily be missing something here, but it looks like
much less locking is needed.
>
> > stats = brute_stats_ptr(task);
> > if (WARN_ON_ONCE(!*stats))
> > return;
> > if (refcount_dec_and_test(&(*stats)->refc)) {
> > kfree(*stats);
> > *stats = NULL;
> > }
> >
> > > +
> > > +/*
> > > + * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA.
> > > + */
> > > +static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7;
> > > +
> > > +/*
> > > + * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA.
> > > + */
> > > +static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10;
> >
> > Should these be externally configurable (via sysfs)?
>
> No problem. I think this is easier than locking :)
Heh, for the most part, yes. ;) Though I have my own nightmares[1] about
sysfs.
> > > +/**
> > > + * brute_update_crash_period() - Update the application crash period.
> > > + * @stats: Statistics that hold the application crash period to update.
> > > + * @now: The current timestamp in jiffies.
> > > + *
> > > + * The application crash period must be a value that is not prone to change due
> > > + * to spurious data and follows the real crash period. So, to compute it, the
> > > + * exponential moving average (EMA) is used.
> > > + *
> > > + * This kind of average defines a weight (between 0 and 1) for the new value to
> > > + * add and applies the remainder of the weight to the current average value.
> > > + * This way, some spurious data will not excessively modify the average and only
> > > + * if the new values are persistent, the moving average will tend towards them.
> > > + *
> > > + * Mathematically the application crash period's EMA can be expressed as
> > > + * follows:
> > > + *
> > > + * period_ema = period * weight + period_ema * (1 - weight)
> > > + *
> > > + * If the operations are applied:
> > > + *
> > > + * period_ema = period * weight + period_ema - period_ema * weight
> > > + *
> > > + * If the operands are ordered:
> > > + *
> > > + * period_ema = period_ema - period_ema * weight + period * weight
> > > + *
> > > + * Finally, this formula can be written as follows:
> > > + *
> > > + * period_ema -= period_ema * weight;
> > > + * period_ema += period * weight;
> > > + *
> > > + * The statistics that hold the application crash period to update cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > + * Return: The last crash timestamp before updating it.
> > > + */
> > > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> > > +{
> > > + u64 current_period;
> > > + u64 last_crash_timestamp;
> > > +
> > > + spin_lock(&stats->lock);
> > > + current_period = now - stats->jiffies;
> > > + last_crash_timestamp = stats->jiffies;
> > > + stats->jiffies = now;
> > > +
> > > + stats->period -= brute_mul_by_ema_weight(stats->period);
> > > + stats->period += brute_mul_by_ema_weight(current_period);
> > > +
> > > + if (stats->faults < BRUTE_MAX_FAULTS)
> > > + stats->faults += 1;
> > > +
> > > + spin_unlock(&stats->lock);
> > > + return last_crash_timestamp;
> > > +}
> >
> > Now *here* locking makes sense, and it only needs to be per-stat, not
> > global, since multiple processes may be operating on the same stat
> > struct. To make this more no-reader-locking-friendly, I'd also update
> > everything at the end, and use WRITE_ONCE():
> >
> > u64 current_period, period;
> > u64 last_crash_timestamp;
> > u64 faults;
> >
> > spin_lock(&stats->lock);
> > current_period = now - stats->jiffies;
> > last_crash_timestamp = stats->jiffies;
> >
> > WRITE_ONCE(stats->period,
> > stats->period - brute_mul_by_ema_weight(stats->period) +
> > brute_mul_by_ema_weight(current_period));
> >
> > if (stats->faults < BRUTE_MAX_FAULTS)
> > WRITE_ONCE(stats->faults, stats->faults + 1);
> >
> > WRITE_ONCE(stats->jiffies, now);
> >
> > spin_unlock(&stats->lock);
> > return last_crash_timestamp;
> >
> > That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
> > without needing to hold the &stats->lock (unless they need perfectly matching
> > jiffies, period, and faults).
>
> Thanks for the refactory. I will work on it (if I can understand locking). :(
It may be worth reading Documentation/memory-barriers.txt which has some
more details.
> > > +/**
> > > + * brute_manage_fork_attack() - Manage a fork brute force attack.
> > > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > > + * @now: The current timestamp in jiffies.
> > > + *
> > > + * For a correct management of a fork brute force attack it is only necessary to
> > > + * update the statistics and test if an attack is happening based on these data.
> > > + *
> > > + * The statistical data shared by all the fork hierarchy processes cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > + * Return: The last crash timestamp before updating it.
> > > + */
> > > +static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> > > +{
> > > + u64 last_fork_crash;
> > > +
> > > + last_fork_crash = brute_update_crash_period(stats, now);
> > > + if (brute_attack_running(stats))
> > > + print_fork_attack_running();
> > > +
> > > + return last_fork_crash;
> > > +}
> > > +
> > > +/**
> > > + * brute_get_exec_stats() - Get the exec statistics.
> > > + * @stats: When this function is called, this parameter must point to the
> > > + * current process' statistical data. When this function returns, this
> > > + * parameter points to the parent process' statistics of the fork
> > > + * hierarchy that hold the current process' statistics.
> > > + *
> > > + * To manage a brute force attack that happens through the execve system call it
> > > + * is not possible to use the statistical data hold by this process due to these
> > > + * statistics disappear when this task is finished. In this scenario this data
> > > + * should be tracked by the statistics of a higher fork hierarchy (the hierarchy
> > > + * that contains the process that forks before the execve system call).
> > > + *
> > > + * To find these statistics the current fork hierarchy must be traversed up
> > > + * until new statistics are found.
> > > + *
> > > + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> > > + */
> > > +static void brute_get_exec_stats(struct brute_stats **stats)
> > > +{
> > > + const struct task_struct *task = current;
> > > + struct brute_stats **p_stats;
> > > +
> > > + do {
> > > + if (!task->real_parent) {
> > > + *stats = NULL;
> > > + return;
> > > + }
> > > +
> > > + p_stats = brute_stats_ptr(task->real_parent);
> > > + task = task->real_parent;
> > > + } while (*stats == *p_stats);
> > > +
> > > + *stats = *p_stats;
> > > +}
> >
> > See Yama's task_is_descendant() for how to walk up the process tree
> > (and I think the process group stuff will save some steps too); you
> > don't need tasklist_lock held, just rcu_read_lock held, AIUI:
> > Documentation/RCU/listRCU.rst
> >
> > And since you're passing this stats struct back up, and it would be outside of rcu read lock, you'd want to do a "get" on it first:
> >
> > rcu_read_lock();
> > loop {
> > ...
> > }
> > refcount_inc_not_zero(&(*p_stats)->refc);
> > rcu_read_unlock();
> >
> > *stats = *p_stats
>
> Thanks for the suggestions. I will work on it for the next version.
> Anyway, in the first version Kees Cook and Jann Horn noted that some tasks
> could escape the rcu read lock and that alternate locking were needed.
>
> Extract from the RFC:
>
> [Kees Cook]
> Can't newly created processes escape this RCU read lock? I think this
> need alternate locking, or something in the task_alloc hook that will
> block any new process from being created within the stats group.
>
> [Jann Horn]
> Good point; the proper way to deal with this would probably be to take
> the tasklist_lock in read mode around this loop (with
> read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs
> with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to
> the fatal_signal_pending() check while holding the lock in
> copy_process(), that would be race-free - any fork() that has not yet
> inserted the new task into the global task list would wait for us to
> drop the tasklist_lock, then bail out at the fatal_signal_pending()
> check.
>
> I think that this scenario is still possible. So the tasklist_lock is
> necessary. Am I right?
Oops, yeah, best to listen to Jann and past-me. :) Were these comments
about finding the parent or killing offenders?
> > > +
> > > + spin_lock(&(*stats)->lock);
> > > + if (!(*stats)->faults)
> > > + (*stats)->jiffies = last_fork_crash;
> > > + spin_unlock(&(*stats)->lock);
> > > +
> > > + brute_update_crash_period(*stats, now);
> >
> > and then you can add:
> >
> > if (refcount_dec_and_test(&(*stats)->refc))
> > kfree(*stats);
> >
> > (or better yet, make that a helper) named something like
> > "put_brute_stats".
>
> Sorry, but I don't understand why we need to free the stats here.
> What is the rationale behind this change?
Err, I think I may have quoted the wrong chunk of your patch! Sorry; I
was talking about the place where you did a free, I think? Disregard
this for now. :)
> > > +/**
> > > + * brute_manage_exec_attack() - Manage an exec brute force attack.
> > > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > > + * @now: The current timestamp in jiffies.
> > > + * @last_fork_crash: The last fork crash timestamp before updating it.
> > > + *
> > > + * For a correct management of an exec brute force attack it is only necessary
> > > + * to update the exec statistics and test if an attack is happening based on
> > > + * these data.
> > > + *
> > > + * It is important to note that if the fork and exec crash periods are the same,
> > > + * the attack test is avoided. This allows that in a brute force attack that
> > > + * happens through the fork system call, the mitigation method does not act on
> > > + * the parent process of the fork hierarchy.
> > > + *
> > > + * The statistical data shared by all the fork hierarchy processes cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > > + * brute_stats_ptr_lock held.
> > > + */
> > > +static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> > > + u64 last_fork_crash)
> > > +{
> > > + int ret;
> > > + struct brute_stats *exec_stats = stats;
> > > + u64 fork_period;
> > > + u64 exec_period;
> > > +
> > > + ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash);
> > > + if (WARN(ret, "No exec statistical data\n"))
> > > + return;
> >
> > I think this should fail closed: if there's a static processing error,
> > treat it as an attack.
>
> Do you mean to trigger the mitigation of a brute force attack over this task?
> So, IIUC you suggest that instead of generate warnings if there isn't
> statistical data, we need to trigger the mitigation? This can be applied to
> the case where the allocation of a brute_stats structure fails?
Right -- it should be an impossible scenario that the stats are
_missing_. There is not an expected execution path in the kernel where
that could happen, so if you're testing for it (and correctly generating
a WARN), it should _also_ fail closed: an impossible case has been
found, so assume userspace is under attack. (Otherwise it could serve as
a bypass for an attacker who has found a way to navigate a process into
this state.)
-Kees
--
Kees Cook
Powered by blists - more mailing lists