lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6eccf37d-07ed-4fcd-bf76-d54603c4361b@arm.com>
Date: Wed, 4 Sep 2024 18:15:24 +0100
From: Robin Murphy <robin.murphy@....com>
To: Will Deacon <will@...nel.org>
Cc: mark.rutland@....com, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, jialong.yang@...ngroup.cn
Subject: Re: [PATCH v3 2/3] perf: Add driver for Arm NI-700 interconnect PMU

On 2024-09-04 1:24 pm, Will Deacon wrote:
> On Mon, Sep 02, 2024 at 07:47:18PM +0100, Robin Murphy wrote:
>> On 02/09/2024 3:47 pm, Will Deacon wrote:
>>>> +static ssize_t arm_ni_format_show(struct device *dev,
>>>> +				  struct device_attribute *attr, char *buf)
>>>> +{
>>>> +	struct arm_ni_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
>>>> +	int lo = __ffs(fmt->field), hi = __fls(fmt->field);
>>>> +
>>>> +	return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
>>>> +}
>>>
>>> Nit: if you end up adding single-bit config fields in the future, this
>>> will quietly do the wrong thing. Maybe safe-guard the 'lo==hi' case (even
>>> if you just warn once and return without doing anything).
>>
>> The counter-argument is that I don't foresee having any reason to add
>> single-bit config fields here in future, nor indeed config1 or config2
>> fields, so I intentionally pruned the would-be dead code while copy-pasting
>> this implementation from arm-cmn. Yes, if someone were to make an incomplete
>> change without paying attention or testing they could introduce a bug, but
>> when is that ever not true?
> 
> I guess I'm just a little more wary when it comes to UAPI. Somebody starts
> relying on the broken message and then you're toast. It's also incredibly
> easy to avoid by construction and the dead code isn't hurting anybody.

Even presuming some theoretical unreviewed broken patch did get merged 
and into real-world kernels without ever being fixed, I still struggle 
to imagine how userspace could somehow grow to *rely* on one PMU driver 
displaying a format attribute in an unexpected manner inconsistent with 
every other PMU driver, as opposed to the far more likely scenario of 
going wrong trying to parse it.

Anyway, after playing with some fun compile-time checks yesterday I've 
just realised there is actually an even simpler solution for doing the 
right thing in general, so I guess thanks for leaning on this :)

>>>> +	name = devm_kasprintf(ni->dev, GFP_KERNEL, "arm_ni_%d_cd_%d", ni->id, cd->id);
>>>> +	if (!name)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	err = cpuhp_state_add_instance(arm_ni_hp_state, &cd->cpuhp_node);
>>>> +	if (err)
>>>> +		return err;
>>>
>>> What happens if there's a CPU hotplug operation here? Can we end up calling
>>> perf_pmu_migrate_context() concurrently with perf_pmu_register()?
>>
>> Yes. Alternatively we could register the PMU before the hotplug handler,
>> then potentially miss a hotplug event and leave a user-visible PMU
>> associated with an invalid CPU. This is a known issue for all system PMU
>> drivers, and the conclusion 5 years ago was that it's impractical to close
>> this race from outside perf core itself[1][2].
> 
> Ok, I'm going to walk right into the trap you've set me...
> 
> Why can't we prevent hotplug (e.g. with cpus_read_lock()) while we're
> setting this up?
> 
> ... and climbing back out of that trap, is the conversation you had with
> Thomas written down anywhere?

I think that must have been the time we talked in person in Cambridge. I 
can't now remember if there is (or was) anything in perf_pmu_register() 
that's actually incompatible with being called under cpus_read_lock(), 
but I do recall that the overall concept of exporting more bits of the 
hotplug machinery in order to copy-paste the same boilerplate bodge in 
every PMU driver was... unpopular.

> I don't want to block this patch, but if five years has passed with
> nobody looking at this then we probably need to address that at some
> point before adding more and more broken drivers.

Looking at it again, is there actually a problem with the current state 
of things? Following through the call path:

   perf_pmu_migrate_context()
     __perf_pmu_remove()
       perf_event_groups_first()
         __group_cmp()
           perf_event_groups_cmp()

the "pmu" pointer only seems to be used as a key to match events in the 
current CPU context, which obviously won't find anything at this point. 
AFAICS it's never even dereferenced, unless any events *are* found for 
__perf_pmu_install() to do something with, which necessarily implies an 
initialised and registered PMU for them to have been opened in the first 
place.

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ