linux-kernel - Re: [PATCH v5 4/4] zram: introduce zram memory tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180418012636.GA196478@rodete-desktop-imager.corp.google.com>
Date:   Wed, 18 Apr 2018 10:26:36 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [PATCH v5 4/4] zram: introduce zram memory tracking

Hi Andrew,

On Tue, Apr 17, 2018 at 02:59:21PM -0700, Andrew Morton wrote:
> On Mon, 16 Apr 2018 18:09:46 +0900 Minchan Kim <minchan@...nel.org> wrote:
> 
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> >       300    75.033841 .wh
> >       301    63.806904 s..
> >       302    63.806919 ..h
> > 
> > First column is zram's block index and 3rh one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Second column represents usec time unit of the block
> > was last accessed. So above example means the 300th block is accessed
> > at 75.033851 second and it was huge so it was written to the backing
> > store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> 
> A few things..
> 
> - Terms like "Admin can" and "Admin could" are worrisome.  How do we
>   know that admins *will* use this?  How do we know that we aren't
>   adding a bunch of stuff which nobody will find to be (sufficiently)
>   useful?  For example, is there some userspace tool to which you are
>   contributing which will be updated to use this feature?

Actually, I used this feature two years ago to find memory hogger
although the feature was very fast prototyping. It was very useful
to reduce memory cost in embedded space.

The reason I am trying to upstream the feature is I need the feature
again. :)

Yub, I have a userspace tool to use the feature although it was
not compatible with this new version. It should be updated with
new format. I will find a time to submit the tool.

> 
> - block_state's second column is in microseconds since some
>   undocumented time.  But how is userspace to know how much time has
>   elapsed since the access?  ie, "current time".

It's a sched_clock so it should be elapsed time since the system boot.
I should have written it explictly.
I will fix it.

> 
> - Is the sched_clock() return value suitable for exporting to
>   userspace?  Is it monotonic?  Is it consistent across CPUs, across
>   CPU hotadd/remove, across suspend/resume, etc?  Does it run all the
>   way up to 2^64 on all CPU types, or will some processors wrap it at
>   (say) 32 bits?  etcetera.  Documentation/timers/timekeeping.txt
>   points out that suspend/resume can mess it up and that the counter
>   can drift between cpus.

Good point!

I just referenced it from ftrace because I thought the goal is similiar
"no need to be exact unless the drift is frequent but wanted to be fast"

AFAIK, ftrace/printk is active user of the function so if the problem
happens frequently, it might be serious. :)