linux-kernel - Re: [PATCH 01/13] kernel/irq/proc: use seq_put_decimal_ull

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c975578.a64f.1932a950632.Coremail.00107082@163.com>
Date: Thu, 14 Nov 2024 20:10:29 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Thomas Gleixner" <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH 01/13] kernel/irq/proc: use seq_put_decimal_ull_width()
 for decimal values

Hi,

At 2024-11-14 03:10:08, "Thomas Gleixner" <tglx@...utronix.de> wrote:
>On Sat, Nov 09 2024 at 00:07, David Wang wrote:
>> The improvement has pratical significance, considering many monitoring
>> tools would read /proc/interrupts periodically.
>
>I've applied this, but ...
>
>looking at a 256 CPU machine. /proc/interrupts provides data for 560
>interrupts, which amounts to ~1.6MB data size.
>
>There are 560 * 256 = 143360 interrupt count fields. 140615 of these
>fields are zero, which means 140615 * 11 bytes. That's 96% of the
>overall data size. The actually useful information is less than
>50KB if properly condensed.
> 
>I'm really amused that people spend a lot of time to improve the
>performance of /proc/interrupts instead of actually sitting down and
>implementing a proper new interface for this purpose, which would make
>both the kernel and the tools faster by probably several orders of
>magnitude.

That's a great idea~.
I tried to make changes and verify the performance, the result is good,
only that kernel side improvement is not that big, but still significant.

Here is what I did, (draft codes are at the end):
Created two new /proc entry for comparision:
1. /proc/interruptsp,  non-zero value only, arch-independent irqs without description
	$ cat /proc/interruptsp 
	IRQ CPU counter # positive only
	0 0 40
	38 5 23
	39 0 81
	40 1 6
	41 2 111
	...
	$ cat /proc/interruptsp  | wc
	     18      57     181

	$ strace -e read -T cat /proc/interruptsp > /dev/null
	...
	read(3, "IRQ CPU counter # positive only\n"..., 131072) = 181 <0.000144>
	read(3, "", 131072)                     = 0 <0.000009>

	$ time ./interruptsp  # 1 million rounds of open/read(all)/close;

	real	1m54.727s
	user	0m0.368s
	sys	1m54.309s



2. /proc/interruptso, same with old format, except arch-dependent irqs are removed
	$ cat /proc/interruptso | wc
	     32     388    4439
	$ strace -e read -T cat /proc/interruptso > /dev/null
	...
	read(3, "            CPU0       CPU1     "..., 131072) = 4005 <0.000071>
	read(3, "  88:          0          0     "..., 131072) = 434 <0.000111>
	read(3, "", 131072)                     = 0 <0.000009>

        $ time ./interruptso # 1 million rounds of open/read(all)/close;

	real	2m19.284s
	user	0m0.400s
	sys	2m18.756s


The size is indeed tens of times shorter, and would have huge improvement
for those applications parsing the whole content; But as for kernel side
improvement, strace and stress test indicates the improvement is not 
that huge, well, but still significant ~40%.


The bottleneck seems to be mtree_load called by irq_to_desc, based on a simple
profiling (not sure whether this is expected or not):

	show_interruptsp(74.724% 845541/1131554)
	    mtree_load(56.596% 478544/845541)
	    __rcu_read_unlock(5.914% 50004/845541)
	    __rcu_read_lock(3.056% 25840/845541)
	    irq_to_desc(2.151% 18184/845541)
	    seq_put_decimal_ull_width(1.211% 10243/845541)
		...


I think the improvement worth pursuing. Maybe a new interface for "active"
interrupts, say /proc/activeinterrupts?, and the old /proc/interrupts can
serve as a table for available ids/cpus/descriptions.

Do you plan to work on this? If not, I can take time on it.



Draft codes:
	int show_interruptsp(struct seq_file *p, void *v)
	{

		int i = *(loff_t *) v, j;
		struct irq_desc *desc;

		if (i >= ACTUAL_NR_IRQS)
			return 0;

		/* print header and calculate the width of the first column */
		if (i == 0) {
			seq_puts(p, "IRQ CPU counter # positive only\n");
		}

		rcu_read_lock();
		desc = irq_to_desc(i);
		if (!desc || irq_settings_is_hidden(desc))
			goto outsparse;

		if (!desc->action || irq_desc_is_chained(desc) || !desc->kstat_irqs)
			goto outsparse;

		for_each_online_cpu(j) {
			unsigned int cnt = desc->kstat_irqs ? per_cpu(desc->kstat_irqs->cnt, j) : 0;
			if (cnt > 0) {
				seq_put_decimal_ull(p, "", i);
				seq_put_decimal_ull(p, " ", j);
				seq_put_decimal_ull(p, " ", cnt);
				seq_putc(p, '\n');
			}
		}

	outsparse:
		rcu_read_unlock();
		return 0;
	}

	int show_interruptso(struct seq_file *p, void *v)
	{
		static int prec;

		int i = *(loff_t *) v, j;
		struct irqaction *action;
		struct irq_desc *desc;
		unsigned long flags;

		if (i >= ACTUAL_NR_IRQS) <<---return when arch-independent irqs are done.
			return 0;   
		...



Thanks~
David