lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mss2dkxo.ffs@tglx>
Date: Thu, 15 Feb 2024 12:30:11 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Bitao Hu <yaoma@...ux.alibaba.com>, dianders@...omium.org,
 akpm@...ux-foundation.org, pmladek@...e.com, kernelfans@...il.com,
 liusong@...ux.alibaba.com
Cc: yaoma@...ux.alibaba.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv7 2/2] watchdog/softlockup: report the most frequent
 interrupts

On Wed, Feb 14 2024 at 10:14, Bitao Hu wrote:
> +static void start_counting_irqs(void)
> +{
> +	int i;
> +	int local_nr_irqs;
> +	struct irq_desc *desc;
> +	u32 *counts = __this_cpu_read(hardirq_counts);
> +
> +	if (!counts) {
> +		/*
> +		 * nr_irqs has the potential to grow at runtime. We should read
> +		 * it and store locally to avoid array out-of-bounds access.
> +		 */
> +		local_nr_irqs = nr_irqs;
> +		counts = kcalloc(local_nr_irqs, sizeof(u32), GFP_ATOMIC);

Seriously? The system has a problem and you allocate memory from the
detection code in hard interrupt context?

> +		if (!counts)
> +			return;
> +
> +		for (i = 0; i < local_nr_irqs; i++) {
> +			desc = irq_to_desc(i);
> +			if (!desc)
> +				continue;
> +			counts[i] = desc->kstat_irqs ?
> +				*this_cpu_ptr(desc->kstat_irqs) : 0;
> +		}

This code has absolutely no business to access an interrupt
descriptor. There is an existing interface to retrieve the stats.

Also iterating one by one over the total number of interrupts is a
complete waste as the interrupt number space is sparse.

> +		for (i = 0; i < NUM_HARDIRQ_REPORT; i++) {
> +			if (irq_counts_sorted[i].irq == -1)
> +				break;
> +
> +			desc = irq_to_desc(irq_counts_sorted[i].irq);
> +			if (desc && desc->action)
> +				printk(KERN_CRIT "\t#%u: %-10u\tirq#%d(%s)\n",
> +				       i + 1, irq_counts_sorted[i].counts,
> +				       irq_counts_sorted[i].irq, desc->action->name);

You cannot dereference desc->action here:

    1) It can be NULL'ed between check and dereference.

    2) Both 'action' and 'action->name' can be freed in parallel

And no, you cannot take desc->lock here to prevent this. Stop fiddling
in the internals of interrupt descriptors.


See my reply on V1 how the stats can be done. That neither needs a
memory allocation nor the local_nr_irqs heuristics and just can use
proper interfaces.

Your initialization code then becomes:

	if (!this_cpu_read(snapshot_taken)) {
        	kstat_snapshot_irqs();
        	this_cpu_write(snapshot_taken, true);
        }

and the analysis boils down to:

        u64 cnt, sorted[3] = {};
        unsigned int irq, i;

    	for_each_active_irq(irq) {
        	cnt = kstat_get_irq_since_snapshot(irq);

		if (cnt) {
                     	for (cnt = (cnt << 32) + irq, i = 0; i < 3; i++) {
                		if (cnt > sorted[i])
                        		swap(cnt, sorted[i]);
                	}
		}
	}

Resetting the thing just becomes:

	this_cpu_write(snapshot_taken, false);

No allocation/free, no bound checks, proper abstractions. See?

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ