linux-kernel - Re: Detecting page cache trashing state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fa511270-71cf-c0fe-2c78-82c8e15f49b8@cisco.com>
Date:   Fri, 27 Oct 2017 23:29:55 +0300
From:   "Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)" 
        <rruslich@...co.com>
To:     vinayak menon <vinayakm.list@...il.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Taras Kondratiuk <takondra@...co.com>,
        Michal Hocko <mhocko@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        xe-linux-external@...co.com, linux-kernel@...r.kernel.org
Subject: Re: Detecting page cache trashing state

On 10/26/2017 06:53 AM, vinayak menon wrote:
> On Thu, Sep 28, 2017 at 9:19 PM, Ruslan Ruslichenko -X (rruslich -
> GLOBALLOGIC INC at Cisco) <rruslich@...co.com> wrote:
>> Hi Johannes,
>>
>> Hopefully I was able to rebase the patch on top v4.9.26 (latest supported
>> version by us right now)
>> and test a bit.
>> The overall idea definitely looks promising, although I have one question on
>> usage.
>> Will it be able to account the time which processes spend on handling major
>> page faults
>> (including fs and iowait time) of refaulting page?
>>
>> As we have one big application which code space occupies big amount of place
>> in page cache,
>> when the system under heavy memory usage will reclaim some of it, the
>> application will
>> start constantly thrashing. Since it code is placed on squashfs it spends
>> whole CPU time
>> decompressing the pages and seem memdelay counters are not detecting this
>> situation.
>> Here are some counters to indicate this:
>>
>> 19:02:44        CPU     %user     %nice   %system   %iowait %steal     %idle
>> 19:02:45        all      0.00      0.00    100.00      0.00 0.00      0.00
>>
>> 19:02:44     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
>> pgscand/s pgsteal/s    %vmeff
>> 19:02:45     15284.00      0.00    428.00    352.00  19990.00 0.00      0.00
>> 15802.00      0.00
>>
>> And as nobody actively allocating memory anymore looks like memdelay
>> counters are not
>> actively incremented:
>>
>> [:~]$ cat /proc/memdelay
>> 268035776
>> 6.13 5.43 3.58
>> 1.90 1.89 1.26
>>
>> Just in case, I have attached the v4.9.26 rebased patched.
>>
> Looks like this 4.9 version does not contain the accounting in lock_page.

In v4.9 there is no wait_on_page_bit_common(), thus accounting moved to
wait_on_page_bit(_killable|_killable_timeout).
Related functionality around lock_page_or_retry() seem to be mostly the 
same in v4.9.