lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5589AC14.4080003@sgi.com>
Date:	Tue, 23 Jun 2015 11:57:24 -0700
From:	Mike Travis <travis@....com>
To:	Ingo Molnar <mingo@...nel.org>
CC:	Toshi Kani <toshi.kani@...com>, tglx@...utronix.de,
	mingo@...hat.com, hpa@...or.com, akpm@...ux-foundation.org,
	roland@...estorage.com, dan.j.williams@...el.com, x86@...nel.org,
	linux-nvdimm@...ts.01.org, linux-kernel@...r.kernel.org,
	Clive Harding <clive@....com>, Russ Anderson <rja@....com>,
	Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH 2/3] mm, x86: Remove region_is_ram() call from ioremap



On 6/23/2015 2:01 AM, Ingo Molnar wrote:
> 
> * Mike Travis <travis@....com> wrote:
> 
>> <<<
>> We have a large university system in the UK that is experiencing
>> very long delays modprobing the driver for a specific I/O device.
>> The delay is from 8-10 minutes per device and there are 31 devices
>> in the system.  This 4 to 5 hour delay in starting up those I/O
>> devices is very much a burden on the customer.
>> ...
>> The problem was tracked down to a very slow IOREMAP operation and
>> the excessively long ioresource lookup to insure that the user is
>> not attempting to ioremap RAM.  These patches provide a speed up
>> to that function.
>>>>>
>>
>> The speed up was pretty dramatic, I think to about 15-20 minutes
>> (the test was done by our local CS person in the UK).  I think this
>> would prove the function was working since it would have fallen
>> back to the previous page_is_ram function and the 4 to 5 hour
>> startup.
> 
> Btw., I think even 15-20 minutes is still in the 'ridiculously slow' category.
> Any chance to fix all of this properly, not just hack by hack?
> 
> Thanks,
> 
> 	Ingo
> 


The current primary cause of the slow start up now lies within
the loading of the kernel and other software to 31 Co-processors
in a serial fashion.  We have suggested to the vendor that they
look at booting and starting these in parallel.

The problem is there are not a whole lot of systems that can
handle more than 4 of them let alone 32.  So it's mostly the
interaction between the customers and the vendor directing
these optimizations.

Any speed up of the kernel startup helps here as well.

[off topic]
Btw, this ~20 minutes time is just for the start up of the
co-processors.  The entire system takes much longer as this is
a huge UV system.  Most of the time is still due to memory
initialization.  Mel's "defer page init" patches help here
tremendously, though it's not clear they will trickle back
down to SLES11 which the customer is running.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ