lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080323034725.GA3781@localhost.ift.unesp.br>
Date:	Sun, 23 Mar 2008 00:47:25 -0300
From:	"Carlos R. Mafra" <crmafra2@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	riel@...hat.com, rjw@...k.pl
Subject: Re: 103 sec. latency: sync_page() with TASK_UNINTERRUPTIBLE (?)
	(bisected)

> Cause                                                Maximum     Percentage
> sync_page __lock_page handle_mm_fault do_page_faul103659.0 msec         59.8 %
> get_request_wait __make_request generic_make_reque1992.0 msec         10.2 %
> get_request_wait __make_request generic_make_reque1620.7 msec          4.6 %
> sync_page __lock_page find_lock_page filemap_fault399.3 msec          3.4 %
> sync_buffer __wait_on_buffer __bread ext3_get_bran292.9 msec          1.3 %
> sync_page __lock_page handle_mm_fault do_page_faul200.5 msec          0.3 %
> Scheduler: waiting for cpu                        155.9 msec         18.9 %
> sync_page __lock_page handle_mm_fault do_page_faul117.5 msec          0.1 %
> congestion_wait try_to_free_pages __alloc_pages re103.5 msec          0.2 %
> r_code
> 
> 
> Process X (2910)
> sync_page __lock_page handle_mm_fault do_page_faul103659.0 msec         98.5 %
> Scheduler: waiting for cpu                         64.6 msec          1.3 %
> sync_page __lock_page find_lock_page filemap_fault 51.4 msec          0.2 %lt do_page_fault error_code
> sync_buffer __wait_on_buffer __bread ext3_get_bran 17.8 msec          0.0 % ext3_get_block do_mpage_readpage mpage_readpa
> sync_page __lock_page handle_mm_fault do_page_faul  9.1 msec          0.0 %ig ault
> 
> The experience goes like this:
> 
> 1) Boot with 2.6.25-rc6-00243-g028011e
> 2) Log into X 
> 3) Open a 380MB file with xjed
> 
> It takes more than 6 minutes to load the file, and meanwhile
> I experience very bad desktop interactivity (like 1 minute to
> the result of pressing F12 in Window Maker to appear in the
> screen).
> 
> If, after step 2), I start using firefox, thunderbird, play some
> music etc and then close all these apps and go to step 3),
> the loading finishes in about 2 minutes (and I have very
> good interactivity meanwhile).
> 
> So I noticed that when xjed is opening the file it is in
> 'D' state (reported by ps), and while greping the kernel
> source code for 'sync_page' I've found this comment 
> in mm/filemap.c:
> 
> /**
>  * __lock_page - get a lock on the page, assuming we need to sleep to get it
>  * @page: the page to lock
>  *
>  * Ugly. Running sync_page() in state TASK_UNINTERRUPTIBLE is scary.  If some
>  * random driver's requestfn sets TASK_RUNNING, we could busywait.  However
>  * chances are that on the second loop, the block layer's plug list is empty,
>  * so sync_page() will then return in state TASK_UNINTERRUPTIBLE.
>  */
> 
> "Ugly. Running sync_page() in state TASK_UNINTERRUPTIBLE is scary."
> 
> It appears that my problem with xjed in 'D' state while loading,
> and sync_page() appearing in latencytop with 103 secs of latency
> may be related through the "ugliness" described above.
> 
> So I was wondering if there is a way to fix this. Note that 
> this issue does not happen if I load the file after using the
> computer for a while, so it is not impossible to have good
> interactivity while loading that big file.


After lots of tests, it turned out to be a regression from
2.6.24.

I bisected it down to f1a9ee758de7de1e040de849fdef46e6802ea117
("kswapd should only wait on IO if there is IO").

If a revert the above commit from the latest git kernel
(v2.6.25-rc6-243-g028011e), then I get an very good interactivity
while xjed is loading the 380MB file (meaning that I can do other things
during the load). Otherwise it takes almost 3 times longer to load,
and interactivity is very bad in the meanwhile. I doubled checked it.

I hope this information will be useful, and I can try any patches
if necessary.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ