linux-kernel - Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZcS+GAaM25LXsBOl@andrea>
Date: Thu, 8 Feb 2024 12:42:16 +0100
From: Andrea Parri <parri.andrea@...il.com>
To: Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>, Anup Patel <anup@...infault.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	Björn Töpel <bjorn@...osinc.com>
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

> +static int __ftrace_modify_code(void *data)
> +{
> +	struct ftrace_modify_param *param = data;
> +
> +	if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> +		ftrace_modify_all_code(param->command);
> +		atomic_inc(&param->cpu_count);

I stared at ftrace_modify_all_code() for a bit but honestly I don't see
what prevents the ->cpu_count increment from being reordered before the
insn write(s) (architecturally) now that you have removed the IPI dance:
perhaps add an smp_wmb() right before the atomic_inc() (or promote this
latter to a (void)atomic_inc_return_release()) and/or an inline comment
saying why such reordering is not possible?

> +	} else {
> +		while (atomic_read(&param->cpu_count) <= num_online_cpus())
> +			cpu_relax();
> +		smp_mb();

I see that you've lifted/copied the memory barrier from patch_text_cb():
what's its point?  AFAIU, the barrier has no ordering effect on program
order later insn fetches; perhaps the code was based on some legacy/old
version of Zifencei?  IAC, comments, comments, ... or maybe just remove
that memory barrier?

> +	}
> +
> +	local_flush_icache_all();
> +
> +	return 0;
> +}

[...]

> @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
>  	if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
>  		for (i = 0; ret == 0 && i < patch->ninsns; i++) {
>  			len = GET_INSN_LENGTH(patch->insns[i]);
> -			ret = patch_text_nosync(patch->addr + i * len,
> -						&patch->insns[i], len);
> +			ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
>  		}
>  		atomic_inc(&patch->cpu_count);
>  	} else {
> @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
>  		smp_mb();
>  	}
>  
> +	local_flush_icache_all();
> +
>  	return ret;
>  }
>  NOKPROBE_SYMBOL(patch_text_cb);

My above remarks/questions also apply to this function.

On a last topic, although somehow orthogonal to the scope of this patch,
I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
correct: I can see why we may want (need to do) the local TLB flush be-
fore returning from patch_{map,unmap}(), but does a local flush suffice?
For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
sequence in their unmapping stage (and apparently relying on "no caching
of invalid ptes" in their mapping stage).  Of course, "broadcasting" our
(riscv's) TLB invalidations will necessary introduce some complexity...

Thoughts?

  Andrea