linux-kernel - Re: [PATCH 3/3] membarrier: Propagate SYNC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201201101637.GU2414@hirez.programming.kicks-ass.net>
Date:   Tue, 1 Dec 2020 11:16:37 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     x86@...nel.org, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Arnd Bergmann <arnd@...db.de>,
        Anton Blanchard <anton@...abs.org>
Subject: Re: [PATCH 3/3] membarrier: Propagate SYNC_CORE and RSEQ actions
 more carefully

On Mon, Nov 30, 2020 at 09:50:35AM -0800, Andy Lutomirski wrote:
> membarrier() carefully propagates SYNC_CORE and RSEQ actions to all
> other CPUs, but there are two issues.
> 
>  - membarrier() does not sync_core() or rseq_preempt() the calling
>    CPU.  Aside from the logic being mind-bending, this also means
>    that it may not be safe to modify user code through an alias,
>    call membarrier(), and then jump to a different executable alias
>    of the same code.

I always understood this to be on purpose. The calling CPU can fix up
itself just fine. The pain point is fixing up the other CPUs, and that's
where membarrier() helps.

That said, I don't mind including self, these aren't fast calls by any
means.

>  - membarrier() does not explicitly sync_core() remote CPUs either;
>    instead, it relies on the assumption that an IPI will result in a
>    core sync.  On x86, I think this may be true in practice, but
>    it's not architecturally reliable.  In particular, the SDM and
>    APM do not appear to guarantee that interrupt delivery is
>    serializing.

Right, I don't think we rely on that, we do rely on interrupt delivery
providing order though -- as per the previous email.

>    On a preemptible kernel, IPI return can schedule,
>    thereby switching to another task in the same mm that was
>    sleeping in a syscall.  The new task could then SYSRET back to
>    usermode without ever executing IRET.

This; I think we all overlooked this scenario.

> This patch simplifies the code to treat the calling CPU just like
> all other CPUs, and explicitly sync_core() on all target CPUs.  This
> eliminates the need for the smp_mb() at the end of the function
> except in the special case of a targeted remote membarrier().  This
> patch updates that code and the comments accordingly.
> 
> Signed-off-by: Andy Lutomirski <luto@...nel.org>

> @@ -228,25 +258,33 @@ static int membarrier_private_expedited(int flags, int cpu_id)
>  		rcu_read_unlock();
>  	}
>  
> -	preempt_disable();
> -	if (cpu_id >= 0)
> -		smp_call_function_single(cpu_id, ipi_func, NULL, 1);
> -	else
> -		smp_call_function_many(tmpmask, ipi_func, NULL, 1);
> -	preempt_enable();
> +	if (cpu_id >= 0) {
> +		int cpu = get_cpu();
> +
> +		if (cpu_id == cpu) {
> +			ipi_func(NULL);
> +		} else {
> +			smp_call_function_single(cpu_id, ipi_func, NULL, 1);
> +			/*
> +			 * This is analogous to the smp_mb() at the beginning
> +			 * of the function -- exit from a system call is not a
> +			 * barrier.  We only need this if we're targeting a
> +			 * specific remote CPU, though -- otherwise ipi_func()
> +			 * would serves the same purpose.
> +			 */
> +			smp_mb();

smp_call_function_single(.wait=1) already orders against completion of
the IPI. Do we really need more?

> +		}
> +
> +		put_cpu();
> +	} else {
> +		on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
> +	}
>  
>  out:
>  	if (cpu_id < 0)
>  		free_cpumask_var(tmpmask);
>  	cpus_read_unlock();
>  
> -	/*
> -	 * Memory barrier on the caller thread _after_ we finished
> -	 * waiting for the last IPI. Matches memory barriers around
> -	 * rq->curr modification in scheduler.
> -	 */
> -	smp_mb();	/* exit from system call is not a mb */
> -
>  	return 0;
>  }
>  
> -- 
> 2.28.0
>