linux-kernel - Re: Swap makes X unfair (was Re: Keys get stuck)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080315221139.GA4163@localhost.ift.unesp.br>
Date:	Sat, 15 Mar 2008 19:11:39 -0300
From:	"Carlos R. Mafra" <crmafra2@...il.com>
To:	Helge Hafting <helgehaf@...l.aitel.hist.no>
Cc:	Helge Hafting <helge.hafting@...el.hist.no>,
	Mike Galbraith <efault@....de>, linux-kernel@...r.kernel.org,
	arjan@...ux.intel.com, jens.axboe@...cle.com, mingo@...e.hu
Subject: Re: Swap makes X unfair (was Re: Keys get stuck)

On Fri 14.Mar'08 at 12:02:14 +0100, Helge Hafting wrote:
> Carlos R. Mafra wrote:
>> On Thu 13.Mar'08 at 15:18:10 +0100, Helge Hafting wrote:
>>   
>>> Carlos R. Mafra wrote:
>>>     
>>>> On Thu 13.Mar'08 at 12:28:13 +0100, Mike Galbraith wrote:
>>>>         
>>>>> [...]
>>>>> Swap can definitely keep X off the cpu for extended periods,
>>>>> [...]
>>>>>             
>>>> So I would like to ask if swap letting X (and everything else
>>>> in my experience) out of the cpu for extended periods is
>>>> considered normal behaviour, in the sense that nobody is
>>>> trying to "fix" it (due to it being considered impossible
>>>> to fix)...?
>>>>         
>>> Yes, this is perfectly normal. A heavily swapping machine
>>> will swap out parts of X.
>>>     
>>
>> Right, but making the mistake of not being very precise
>> I would not say my desktop was "heavily swapping". 
>>   
>>> Now, if X has a need for low-latency for keyboard handling,
>>> then the X developers can use mlock to lock
>>> the X keyboard service in memory, and make it a real-time
>>> (or at least high priority) process too.  This should
>>> avoid the problem even with extreme swapping and/or
>>> high cpu load.
>>>     
>>
>> That would be a great thing for me. But why one wouldn't
>> want this behaviour to be default for a desktop? I mean
>> it should be like that already for a desktop experience.
>>   
> Normally, memory that is used all the time does not get
> swapped out. If you use X while the machine is swapping,
> you will normally see lots of little delays, not
> one longe freeze.  So this may have been something else.
>
> This scenario seems to require only moderate swap. There
> may be another explanation, that require more X knowledge
> than I have:
>
> Some windowing systems give apps the opportunity of
> blocking the entire windowing system while doing stuff.
> This is usually only used for "system modal dialogs" and
> for very quick operations that need to stop all other user interaction.
>
> I don't know to what extent this is possible in X, but if
> it is, then the door is open for badly written apps to
> to stupid things like load a 380M file while the user interface
> is blocked. A well-designed app should do such lengthy jobs
> without blocking everything else, so the user can do other stuff
> while the machine works.  Note that any io-operation _might_ be
> lengthy - even a 50-byte file could reside on a network file
> system on a server located in a different continent.
>
> You may want to write to xjed developers, perhaps they
> can do something about this. Like loading the
> file in the background, perhaps.
> [...]

As I noted in another thread (unfortunately ignored so far),
sometimes xjed opens the file and my desktop is ok, as 
responsive as I expect it to be. 

So this isn't a xjed bug, even tough it could be better
and load parts of the file on demand (like vi appears
to do).


> The interesting question is whether this is a swapping problem that can be 
> solved by kernel fixing, or if it merely is a problem with the design of the X 
> server.
> In the latter case you need to contact x.org developers instead. xjed 
> developers  can probably work around the problem too, although that 
> would to be unnecessary
> if both the kernel and the x server were ideal.
>
> A test you can do:
> * Start up X as usual and log in
> * Switch to the console (ctrl+alt+F2)
> * Log in on the console, with the same user as you have in the X session.
> * Give the command "export DISPLAY=:0"  (without the quotes)
> *  Start xjed with the big file, from the console. You won't see it there,
>   it will show itself in the X session instead. Make sure you start it in 
> the
>   background, i.e.:   "xjed bigfile & "
>
> The machine should now start using the disk, loading the big file and swap 
> as usual.
> Now try doing stuff in the console shell. Try various commands like
> ls, view some text files in "vim", things like that.
>
> Now, if your console session is just as blocked and sluggish as the X 
> session usually
> gets - then it is a kernel/swapping problem.

So it is a kernel problem!

I did this test a few times today, using today's git kernel and 
it takes 3 seconds for 'ls' to appear in the screen after typing,
and 10 seconds (I checked with the clock) to be executed.

Today I think I've found a reliable way to reproduce this
bad behaviour (I've got it 5 times from 5 attempts so far).
It is independent of elevator={anticipatory, cfq, deadline},
I could get the bad results with all of them.

I just boot the machine, mount the swap partition manually (it
looks like a Mandriva Cooker bug) and start 'xjed bigfile.txt &'
in the terminal (it also happens outside of X).

It takes more than 15 minutes to load the 380MB file when I hit
the problem. It takes like 10 minutes to finish Ingo's cfs-debug
script (I can see it stop _after_ the file is loaded).

I could verify that swap usage goes to 400 MB out of 2 GB, and when
I start the test this is what I have:

             total       used       free     shared    buffers     cached
Mem:        515496     148628     366868          0       9916      80612
-/+ buffers/cache:      58100     457396
Swap:      2144636          0    2144636

In one of the tests I started 'vmstat 1 >> vmstat_log3.txt' a few seconds
before starting xjed. And this is what I got (notice that it has 400 lines,
showing that it took 400 seconds to load the file, but in fact it was
even more. Time seems to be distorted during this test)

www.ift.unesp.br/users/crmafra/vmstat_log3.txt

I could also get some info from latencytop during the test, but
it was very difficult because the terminal screen took ages to
refresh. This is worst log I could get manually:

Cause                                                Maximum     Percentage
get_request_wait __make_request generic_make_reque2201.3 msec          8.4 %
get_request_wait __make_request generic_make_reque1877.3 msec         20.5 %
sync_page __lock_page handle_mm_fault do_page_faul945.5 msec          0.4 %
sync_page __lock_page find_lock_page filemap_fault860.3 msec         17.2 %
sync_page __lock_page handle_mm_fault do_page_faul828.3 msec          0.9 %
sync_page __lock_page handle_mm_fault do_page_faul817.9 msec         21.7 %
sync_buffer __wait_on_buffer __bread ext3_get_bran752.6 msec          2.8 %
get_request_wait __make_request generic_make_reque726.6 msec          0.4 %
sync_page __lock_page find_lock_page filemap_fault617.0 msec          0.3 %
inary search_binary_handler do_execve


Process dhclient-script (3934)
get_request_wait __make_request generic_make_reque1562.5 msec         36.8 %ge shrink_page_list shrink_inactive_list shri
sync_page __lock_page find_lock_page filemap_fault473.4 msec         28.2 %lt do_page_fault error_code
sync_page __lock_page handle_mm_fault do_page_faul405.9 msec         13.3 %
Scheduler: waiting for cpu                        120.1 msec         19.0 %
congestion_wait try_to_free_pages __alloc_pages __ 99.7 msec          2.5 %o_page_cache_readahead filemap_fault __do_faul
congestion_wait __alloc_pages __do_page_cache_read 15.9 msec          0.2 %head filemap_fault __do_fault handle_mm_fault
do_page_fault error_code


So I think I am definitely hitting some kernel bug.

If someone in this list becomes interested in this problem, I will
be happy to help with any testing necessary.

I can live with this problem (I don't even need to read that 
file anymore). But the bug exists.

Another thing I want to mention is about the CFS debug script 
results. For example:

[mafra@...alhost:~]$ grep se.wait_max cfs-debug-info-2008.03.15-17.22.51
se.wait_max                        :           320.220900
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           308.730029
se.wait_max                        :             0.000000
se.wait_max                        :           216.652542
se.wait_max                        :           412.076086
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           100.015586
se.wait_max                        :             0.000000
se.wait_max                        :           149.990226
se.wait_max                        :            15.718068
se.wait_max                        :            49.988316
se.wait_max                        :             0.000000
se.wait_max                        :            29.287277
se.wait_max                        :            62.785571
se.wait_max                        :           602.883709
se.wait_max                        :            10.646829
se.wait_max                        :             0.008188
se.wait_max                        :            19.871847
se.wait_max                        :            19.740026
se.wait_max                        :             0.027021
se.wait_max                        :            19.817248
se.wait_max                        :             0.026897
se.wait_max                        :             0.640153
se.wait_max                        :             0.808060
se.wait_max                        :             0.389118
se.wait_max                        :             0.027009
se.wait_max                        :             0.083961
se.wait_max                        :             3.264754
se.wait_max                        :             0.564208
se.wait_max                        :             0.092138
se.wait_max                        :             0.991528
se.wait_max                        :             3.058615
se.wait_max                        :             0.026908
se.wait_max                        :            11.891953
se.wait_max                        :            10.954652
se.wait_max                        :             1.115334
se.wait_max                        :             1.226744
se.wait_max                        :            10.947289
se.wait_max                        :            11.980765
se.wait_max                        :             1.349930
se.wait_max                        :            12.950094
se.wait_max                        :             1.438758
se.wait_max                        :            12.995926
se.wait_max                        :             1.523568
se.wait_max                        :            13.067781
se.wait_max                        :            15.039010
se.wait_max                        :             1.646566
se.wait_max                        :            15.134557
se.wait_max                        :             1.757942
se.wait_max                        :             1.830261
se.wait_max                        :            15.265369
se.wait_max                        :             1.921997
se.wait_max                        :            15.192441
se.wait_max                        :             2.097341
se.wait_max                        :             0.243086
se.wait_max                        :             2.028549
se.wait_max                        :             2.169289
se.wait_max                        :             2.280411
se.wait_max                        :             2.356268
se.wait_max                        :             1.719666
se.wait_max                        :             2.451732
se.wait_max                        :             2.535262
se.wait_max                        :             1.846795
se.wait_max                        :             2.489097
se.wait_max                        :             2.550678
se.wait_max                        :             2.816960
se.wait_max                        :             2.728959
se.wait_max                        :             2.068324
se.wait_max                        :             2.912895
se.wait_max                        :             3.072540
se.wait_max                        :             0.952594
se.wait_max                        :             3.334903
se.wait_max                        :             3.160131
se.wait_max                        :             3.399756
se.wait_max                        :            11.581309
se.wait_max                        :             3.684554
se.wait_max                        :           278.279914
se.wait_max                        :            23.601522
se.wait_max                        :            98.543577
se.wait_max                        :           259.453483
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           127.155292
se.wait_max                        :             0.000000
se.wait_max                        :           494.934718
se.wait_max                        :             0.000000
se.wait_max                        :            20.022997
se.wait_max                        :            40.072908
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :            27.785589
se.wait_max                        :            45.076186
se.wait_max                        :            33.335700
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           121.580585
se.wait_max                        :             6.825214
se.wait_max                        :            18.178228
se.wait_max                        :             0.000000
se.wait_max                        :           364.938378
se.wait_max                        :           186.608743
se.wait_max                        :             0.000000
se.wait_max                        :           191.874571
se.wait_max                        :            12.138841
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           365.831692
se.wait_max                        :           196.069929
se.wait_max                        :           186.608743
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :           235.761630
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :            78.088419
se.wait_max                        :            33.871607
se.wait_max                        :             0.000000
se.wait_max                        :            26.201039
se.wait_max                        :           476.629993
se.wait_max                        :           587.639944
se.wait_max                        :           184.918777
se.wait_max                        :           334.997228
se.wait_max                        :           216.701937
se.wait_max                        :             0.909482
se.wait_max                        :           134.804841
se.wait_max                        :           114.442959
se.wait_max                        :           100.189898
se.wait_max                        :           100.155489
se.wait_max                        :           134.945855
se.wait_max                        :           114.634475
se.wait_max                        :             0.000000
se.wait_max                        :           542.107819
se.wait_max                        :            40.222426
se.wait_max                        :             0.000000
se.wait_max                        :            16.558527
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :            99.997595
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :            56.632829
se.wait_max                        :             0.000000
se.wait_max                        :             0.000000
se.wait_max                        :            80.577203


This is the worst register I have, it was obtained after
clearing the CFS stats. But most of the logs I have
from this script are fine (only a few se.wait_max over
40 msecs, with the worst being around 120 or so).



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/