lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5400E62F.8000405@sgi.com>
Date:	Fri, 29 Aug 2014 13:44:31 -0700
From:	Mike Travis <travis@....com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	mingo@...hat.com, tglx@...utronix.de, hpa@...or.com,
	msalter@...hat.com, dyoung@...hat.com, riel@...hat.com,
	peterz@...radead.org, mgorman@...e.de,
	linux-kernel@...r.kernel.org, x86@...nel.org, linux-mm@...ck.org,
	Alex Thorlton <athorlton@....com>, Cliff Wickman <cpw@....com>,
	Russ Anderson <rja@....com>
Subject: Re: [PATCH 0/2] x86: Speed up ioremap operations



On 8/29/2014 1:16 PM, Andrew Morton wrote:
> On Fri, 29 Aug 2014 14:53:28 -0500 Mike Travis <travis@....com> wrote:
> 
>>
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>>
>> There are two causes for requiring a restart/reload of the drivers.
>> First is periodic preventive maintenance (PM) and the second is if
>> any of the devices experience a fatal error.  Both of these trigger
>> this excessively long delay in bringing the system back up to full
>> capability.
>>
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>
> 
> Really would prefer to have some quantitative testing results in here,
> as that is the entire point of the patchset.  And it leaves the reader
> wondering "how much of this severe problem remains?".

Okay, I have some results from testing.  The modprobe time appears to
be affected quite a bit by previous activity on the ioresource list,
which I suspect is due to cache preloading.  While the overall
improvement is impacted by other overhead of starting the devices,
this drastically improves the modprobe time.

Also our system is considerably smaller so the percentages gained
will not be the same.  Best case improvement with the modprobe
on our 20 device smallish system was from 'real    5m51.913s' to
'real    0m18.275s'.

> Also, the -stable backport is a big ask, isn't it?  It's arguably
> notabug and the affected number of machines is small.
> 

Ingo had suggested this.  We are definitely pushing it to our distro
suppliers for our customers.  Whether it's a big deal for smaller
systems is up in the air.  Note that the customer system has 31 devices
on an SSI that includes a large number of other IB and SAS devices
as well as a number of nodes which all which have discontiguous memory
segments.  I'm envisioning an ioresource list that numbers at least
several hundred entries.  While that's somewhat indicative of typical
UV systems it is generally not that common otherwise.

So I guess the -stable is merely a suggestion, not a request.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ