lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 8 Feb 2024 12:42:16 +0100
From: Andrea Parri <parri.andrea@...il.com>
To: Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>, Anup Patel <anup@...infault.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	Björn Töpel <bjorn@...osinc.com>
Subject: Re: [PATCH] riscv: Fix text patching when icache flushes use IPIs

> +static int __ftrace_modify_code(void *data)
> +{
> +	struct ftrace_modify_param *param = data;
> +
> +	if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> +		ftrace_modify_all_code(param->command);
> +		atomic_inc(&param->cpu_count);

I stared at ftrace_modify_all_code() for a bit but honestly I don't see
what prevents the ->cpu_count increment from being reordered before the
insn write(s) (architecturally) now that you have removed the IPI dance:
perhaps add an smp_wmb() right before the atomic_inc() (or promote this
latter to a (void)atomic_inc_return_release()) and/or an inline comment
saying why such reordering is not possible?


> +	} else {
> +		while (atomic_read(&param->cpu_count) <= num_online_cpus())
> +			cpu_relax();
> +		smp_mb();

I see that you've lifted/copied the memory barrier from patch_text_cb():
what's its point?  AFAIU, the barrier has no ordering effect on program
order later insn fetches; perhaps the code was based on some legacy/old
version of Zifencei?  IAC, comments, comments, ... or maybe just remove
that memory barrier?


> +	}
> +
> +	local_flush_icache_all();
> +
> +	return 0;
> +}

[...]


> @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
>  	if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
>  		for (i = 0; ret == 0 && i < patch->ninsns; i++) {
>  			len = GET_INSN_LENGTH(patch->insns[i]);
> -			ret = patch_text_nosync(patch->addr + i * len,
> -						&patch->insns[i], len);
> +			ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
>  		}
>  		atomic_inc(&patch->cpu_count);
>  	} else {
> @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
>  		smp_mb();
>  	}
>  
> +	local_flush_icache_all();
> +
>  	return ret;
>  }
>  NOKPROBE_SYMBOL(patch_text_cb);

My above remarks/questions also apply to this function.


On a last topic, although somehow orthogonal to the scope of this patch,
I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
correct: I can see why we may want (need to do) the local TLB flush be-
fore returning from patch_{map,unmap}(), but does a local flush suffice?
For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
sequence in their unmapping stage (and apparently relying on "no caching
of invalid ptes" in their mapping stage).  Of course, "broadcasting" our
(riscv's) TLB invalidations will necessary introduce some complexity...

Thoughts?

  Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ