linux-kernel - Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190508134714.GA27786@x230.aquini.net>
Date:   Wed, 8 May 2019 09:47:14 -0400
From:   Rafael Aquini <aquini@...hat.com>
To:     Yury Norov <yury.norov@...il.com>
Cc:     Joel Savitz <jsavitz@...hat.com>, linux-kernel@...r.kernel.org,
        Alexey Dobriyan <adobriyan@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Ram Pai <linuxram@...ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Huang Ying <ying.huang@...el.com>,
        Sandeep Patil <sspatil@...roid.com>,
        linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3] fs/proc: add VmTaskSize field to /proc/$$/status

On Tue, May 07, 2019 at 11:37:16PM -0700, Yury Norov wrote:
> On Tue, May 07, 2019 at 08:54:31AM -0400, Rafael Aquini wrote:
> > On Mon, May 06, 2019 at 11:53:43AM -0400, Joel Savitz wrote:
> > > There is currently no easy and architecture-independent way to find the
> > > lowest unusable virtual address available to a process without
> > > brute-force calculation. This patch allows a user to easily retrieve
> > > this value via /proc/<pid>/status.
> > > 
> > > Using this patch, any program that previously needed to waste cpu cycles
> > > recalculating a non-sensitive process-dependent value already known to
> > > the kernel can now be optimized to use this mechanism.
> > > 
> > > Signed-off-by: Joel Savitz <jsavitz@...hat.com>
> > > ---
> > >  Documentation/filesystems/proc.txt | 2 ++
> > >  fs/proc/task_mmu.c                 | 2 ++
> > >  2 files changed, 4 insertions(+)
> > > 
> > > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> > > index 66cad5c86171..1c6a912e3975 100644
> > > --- a/Documentation/filesystems/proc.txt
> > > +++ b/Documentation/filesystems/proc.txt
> > > @@ -187,6 +187,7 @@ read the file /proc/PID/status:
> > >    VmLib:      1412 kB
> > >    VmPTE:        20 kb
> > >    VmSwap:        0 kB
> > > +  VmTaskSize:	137438953468 kB
> > >    HugetlbPages:          0 kB
> > >    CoreDumping:    0
> > >    THP_enabled:	  1
> > > @@ -263,6 +264,7 @@ Table 1-2: Contents of the status files (as of 4.19)
> > >   VmPTE                       size of page table entries
> > >   VmSwap                      amount of swap used by anonymous private data
> > >                               (shmem swap usage is not included)
> > > + VmTaskSize                  lowest unusable address in process virtual memory
> > 
> > Can we change this help text to "size of process' virtual address space memory" ?
> 
> Agree. Or go in other direction and make it VmEnd
> 
> > >   HugetlbPages                size of hugetlb memory portions
> > >   CoreDumping                 process's memory is currently being dumped
> > >                               (killing the process may lead to a corrupted core)
> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > index 95ca1fe7283c..0af7081f7b19 100644
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -74,6 +74,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
> > >  	seq_put_decimal_ull_width(m,
> > >  		    " kB\nVmPTE:\t", mm_pgtables_bytes(mm) >> 10, 8);
> > >  	SEQ_PUT_DEC(" kB\nVmSwap:\t", swap);
> > > +	seq_put_decimal_ull_width(m,
> > > +		    " kB\nVmTaskSize:\t", mm->task_size >> 10, 8);
> > >  	seq_puts(m, " kB\n");
> > >  	hugetlb_report_usage(m, mm);
> > >  }
> 
> I'm OK with technical part, but I still have questions not answered
> (or wrongly answered) in v1 and v2. Below is the very detailed
> description of the concerns I have.
> 
> 1. What is the exact reason for it? Original version tells about some
> test that takes so much time that you were able to drink a cup of
> coffee before it was done. The test as you said implements linear
> search to find the last page and so is of O(n). If it's only for some
> random test, I think the kernel can survive without it. Do you have a
> real example of useful programs that suffer without this information?
> 
> 
> 2. I have nothing against taking breaks and see nothing weird if
> ineffective algorithms take time. On my system (x86, Ubuntu) the last
> mapped region according to /proc/<pid>/maps is:
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0     [vsyscall]
> So to find the required address, we have to inspect 2559 pages. With a
> binary search it would take 12 iterations at max. If my calculation is
> wrong or your environment is completely different - please elaborate.
> 
> 3. As far as I can see, Linux currently does not support dynamic
> TASK_SIZE. It means that for any platform+ABI combination VmTaskSize
> will be always the same. So VmTaskSize would be effectively useless waste

Assuming you can have it fixed and decide upon one another at compile
time also is not necessarely true, unfortunately. One could adjust PAGE_OFFSET, 
at kernel config, to provide different splits for the virtual address space,
which will affect TASK_SIZE, eventually. (see arch/x86/Kconfig)

 
> of lines. In fact, TASK SIZE is compiler time information and should
> be exposed to user in headers. In discussion to v2 Rafael Aquini answered
> for this concern that TASK_SIZE is a runtime resolved macro. It's
> true, but my main point is: GCC knows what type of binary it compiles
> and is able to select proper value. We are already doing similar things
> where appropriate. Refer for example to my arm64/ilp32 series:
> arch/arm64/include/uapi/asm/bitsperlong.h:
> -#define __BITS_PER_LONG 64
> +#if defined(__LP64__)
> +/* Assuming __LP64__ will be defined for native ELF64's and not for ILP32. */
> +#  define __BITS_PER_LONG 64
> +#elif defined(__ILP32__)
> +#  define __BITS_PER_LONG 32
> +#else
> +#  error "Neither LP64 nor ILP32: unsupported ABI in asm/bitsperlong.h"
> +#endif
> 


You are correct, but you miss the point Joel is trying to provide that
value in an architectural agnostic way to avoid the hassle of keep adding
more preprocessor complexity and being concerned about arch particularities.

You were spot on pointing the issues with the prctl(2) approach before,
but I don't see the need to overengineer the suggested approach here. 
The cost of getting mm->task_size exported via /proc/<pid>/status is
neglectable, and prevents further complexity going in for such simple
task.


Regards,
-- Rafael