lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180419203949.GA4555@avx2>
Date:   Thu, 19 Apr 2018 23:39:49 +0300
From:   Alexey Dobriyan <adobriyan@...il.com>
To:     Waiman Long <longman@...hat.com>
Cc:     linux-kernel@...r.kernel.org, rdunlap@...radead.org,
        akpm@...ux-foundation.org
Subject: Re: [PATCH] proc/stat: Separate out individual irq counts into
 /proc/stat_irqs

On Thu, Apr 19, 2018 at 04:21:14PM -0400, Waiman Long wrote:
> On 04/19/2018 03:55 PM, Alexey Dobriyan wrote:
> > On Thu, Apr 19, 2018 at 03:28:40PM -0400, Waiman Long wrote:
> >> On 04/19/2018 03:08 PM, Alexey Dobriyan wrote:
> >>>> Therefore, application performance can be impacted if the application
> >>>> reads /proc/stat rather frequently.
> >>> [nods]
> >>> Text interfaces can be designed in a very stupid way.
> >>>
> >>>> For example, reading /proc/stat in a certain 2-socket Skylake server
> >>>> took about 4.6ms because it had over 5k irqs.
> >>> Is this top(1)? What is this application doing?
> >>> If it needs percpu usage stats, then maybe /proc/stat should be
> >>> converted away from single_open() so that core seq_file code doesn't
> >>> generate everything at once.
> >> The application is actually a database benchmarking tool used by a
> >> customer.
> > So it probably needs lines before "intr" line.
> >
> >> The reading of /proc/stat is an artifact of the benchmarking
> >> tool that can actually be turned off. Without doing that, about 20% of
> >> CPU time were spent reading /proc/stat and the trashing of cachelines
> >> slowed the benchmark number quite significantly. However, I was also
> >> told that there are legitimate cases where reading /proc/stat was
> >> necessary in some of their applications.
> >>
> >>>> -
> >>>> -	/* sum again ? it could be updated? */
> >>>> -	for_each_irq_nr(j)
> >>>> -		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
> >>>> -
> >>> This is direct userspace breakage.
> >> Yes, I am aware of that. That is the cost of improving the performance
> >> of applications that read /proc/stat, but don't need the individual irq
> >> counts.
> > Yeah, but all it takes is one script which cares.
> >
> > I have an idea.
> >
> > Maintain "maximum registered irq #", it should be much smaller than
> > "nr_irqs":
> >
> > intr 4245359 151 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 44330 182364 57741 0 0 0 0 0 0 0 0 85 89124 0 0 0 0 0 323360 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> Yes, that can probably help.
> 
> This is the data from the problematic skylake server:
> 
> model name    : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
> 56 sosreport-carevalo.02076935-20180413085327/proc/stat
> Interrupts: 5370
> Interrupts without "0" entries: 1011
> 
> There are still quite a large number of non-zero entries, though.
> 
> > Or maintain array of registered irqs and iterate over them only.
> 
> Right, we can allocate a bitmap of used irqs to do that.
> 
> >
> > I have another idea.
> >
> > perf record shows mutex_lock/mutex_unlock at the top.
> > Most of them are irq mutex not seqfile mutex as there are many more
> > interrupts than reads. Take it once.
> >
> How many cpus are in your test system? In that skylake server, it was
> the per-cpu summing operation of the irq counts that was consuming most
> of the time for reading /proc/stat. I think we can certainly try to
> optimize the lock taking.

It's 16x(NR_IRQS: 4352, nr_irqs: 960, preallocated irqs: 16)
Given that irq registering is rare operation, maintaining sorted array
of irq should be the best option.

> For the time being, I think I am going to have a clone /proc/stat2 as
> suggested in my earlier email. Alternatively, I can put that somewhere
> in sysfs if you have a good idea of where I can put it.

sysfs is strictly one-value-per-file.

> I will also look into ways to optimize the current per-IRQ stats
> handling, but it will come later.

There is always a time-honored way of ioctl(2) switching irq info off
/proc supports that.

There are many options.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ