linux-kernel - Re: [PATCH] uprobes: Improve the usage of xol slots for better scalability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZuwoUmqXrztp-Mzh@tassilo>
Date: Thu, 19 Sep 2024 06:34:10 -0700
From: Andi Kleen <ak@...ux.intel.com>
To: "Liao, Chang" <liaochang1@...wei.com>
Cc: mhiramat@...nel.org, oleg@...hat.com, andrii@...nel.org,
	peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
	namhyung@...nel.org, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	linux-perf-users@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] uprobes: Improve the usage of xol slots for better
 scalability

> Sorry, I know nothing about the ThreadSanitizer and related annotation,
> could you provide some information about it, thanks.

Documentation/dev-tools/kcsan.rst

> > Would be good to have some commentary why doing so
> > many write operations with merely a rcu_read_lock as protection is safe.
> > It might be safer to put some write type operations under a real lock. 
> > Also it is unclear how the RCU grace period for utasks is enforced.
> 
> You are right, but I think using atomic refcount routine might be a more
> suitable apprach for this scenario. The slot_ret field of utask instance

Does it really all need to be lockless? Perhaps you can only make the 
common case lockless, but then only when the list overflows take a lock 
and avoid a lot of races. That might be good enough for performance.

If you really want a complex lockless scheme you need very clear documentation
in comments and commit logs at least.

Also there should be a test case that stresses the various cases.

I would just use a lock 
> is used to track the status of insn_slot. slot_ret supports three values.
> A value of 2 means the utask associated insn_slot is currently in use by
> uprobe. A value of 1 means the slot is no being used by uprobe. A value
> of 0 means the slot has been reclaimed. So in some term, the atomic refcount
> routine test_and_pout_task_slot() also avoid the racing when writing to
> the utask instance, providing additional status information about insn_slot.
> 
> BTW, You reminded me that since it might recycle the slot after deleting the
> utask from the garbage collection list, so it's necessary to use
> test_and_put_task_slot() to avoid the racing on the stale utask. the correct
> code might be something like this:
> 
> @@ -1771,16 +1783,16 @@ static void xol_free_insn_slot(struct task_struct *tsk)
> 
>         spin_lock_irqsave(&area->list_lock, flags);
>         list_del_rcu(&tsk->utask->gc);
> +       /* Ensure the slot is not in use or reclaimed on other CPU */
> +       if (test_and_put_task_slot(tsk->utask)) {
> +               clear_bit(tsk->utask->insn_slot, area->bitmap);
> +               atomic_dec(&area->slot_count);
> +               tsk->utask->insn_slot = UINSNS_PER_PAGE;
> +               get_task_slot(tsk->utask);
> +       }

I would have expected you would add a if (racing) bail out, assume the
other CPU will do the work type check but that doesn't seem to be what
the code is doing.

-Andi