lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0939300c-a825-5b46-d86f-72ce89b2b95f@huawei.com>
Date: Mon, 23 Sep 2024 18:29:54 +0800
From: "Liao, Chang" <liaochang1@...wei.com>
To: Andi Kleen <ak@...ux.intel.com>
CC: <mhiramat@...nel.org>, <oleg@...hat.com>, <andrii@...nel.org>,
	<peterz@...radead.org>, <mingo@...hat.com>, <acme@...nel.org>,
	<namhyung@...nel.org>, <mark.rutland@....com>,
	<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
	<irogers@...gle.com>, <adrian.hunter@...el.com>, <kan.liang@...ux.intel.com>,
	<linux-kernel@...r.kernel.org>, <linux-trace-kernel@...r.kernel.org>,
	<linux-perf-users@...r.kernel.org>, <bpf@...r.kernel.org>
Subject: Re: [PATCH] uprobes: Improve the usage of xol slots for better
 scalability



在 2024/9/19 21:34, Andi Kleen 写道:
>> Sorry, I know nothing about the ThreadSanitizer and related annotation,
>> could you provide some information about it, thanks.
> 
> Documentation/dev-tools/kcsan.rst

Thanks.

> 
>>> Would be good to have some commentary why doing so
>>> many write operations with merely a rcu_read_lock as protection is safe.
>>> It might be safer to put some write type operations under a real lock. 
>>> Also it is unclear how the RCU grace period for utasks is enforced.
>>
>> You are right, but I think using atomic refcount routine might be a more
>> suitable apprach for this scenario. The slot_ret field of utask instance
> 
> Does it really all need to be lockless? Perhaps you can only make the 
> common case lockless, but then only when the list overflows take a lock 
> and avoid a lot of races. That might be good enough for performance.

Agreed, List overflow would happen if new threads were constantly spawned
and hit the breakpoint. I'm not sure how often this occurs in real application.
Even if some applications follow this pattern, I suspect the bottleneck
might shift to another point, like dynamically allocating new utask instances.
At least, for the scalability selftest benchmark, list overflow shouldn't
be a common case.

> 
> If you really want a complex lockless scheme you need very clear documentation
> in comments and commit logs at least.
> 
> Also there should be a test case that stresses the various cases.
> 
> I would just use a lock 

Thanks for the suggestions, I will experiment with a read-write lock, meanwhile,
adding the documentation and testing for the lockless scheme.

>> is used to track the status of insn_slot. slot_ret supports three values.
>> A value of 2 means the utask associated insn_slot is currently in use by
>> uprobe. A value of 1 means the slot is no being used by uprobe. A value
>> of 0 means the slot has been reclaimed. So in some term, the atomic refcount
>> routine test_and_pout_task_slot() also avoid the racing when writing to
>> the utask instance, providing additional status information about insn_slot.
>>
>> BTW, You reminded me that since it might recycle the slot after deleting the
>> utask from the garbage collection list, so it's necessary to use
>> test_and_put_task_slot() to avoid the racing on the stale utask. the correct
>> code might be something like this:
>>
>> @@ -1771,16 +1783,16 @@ static void xol_free_insn_slot(struct task_struct *tsk)
>>
>>         spin_lock_irqsave(&area->list_lock, flags);
>>         list_del_rcu(&tsk->utask->gc);
>> +       /* Ensure the slot is not in use or reclaimed on other CPU */
>> +       if (test_and_put_task_slot(tsk->utask)) {
>> +               clear_bit(tsk->utask->insn_slot, area->bitmap);
>> +               atomic_dec(&area->slot_count);
>> +               tsk->utask->insn_slot = UINSNS_PER_PAGE;
>> +               get_task_slot(tsk->utask);
>> +       }
> 
> I would have expected you would add a if (racing) bail out, assume the
> other CPU will do the work type check but that doesn't seem to be what
> the code is doing.

Sorry, I may not probably get the point clear here, and it would be very
nice if more details are provided for the concern. Do you mean it's necessary
to make the if-body excution exclusive among the CPUs? If that's the case,
I guess the test_and_put_task_slot() is the equvialent to the race condition
check. test_and_put_task_slot() uses a compare and exchange operation on the
slot_ref of utask instance. Regardless of the work type being performed by
other CPU, it will always bail out unless the slot_ref has a value of one,
indicating the utask is free to access from local CPU.

> 
> -Andi
> 
> 

-- 
BR
Liao, Chang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ