lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 2 Sep 2011 09:53:50 +0400 From: Cyrill Gorcunov <gorcunov@...il.com> To: Andrew Morton <akpm@...ux-foundation.org> Cc: "Kirill A. Shutemov" <kirill@...temov.name>, Vasiliy Kulikov <segoon@...nwall.com>, containers@...ts.osdl.org, linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org, Nathan Lynch <ntl@...ox.com>, Oren Laadan <orenl@...columbia.edu>, Daniel Lezcano <dlezcano@...ibm.com>, Glauber Costa <glommer@...allels.com>, James Bottomley <jbottomley@...allels.com>, Tejun Heo <tj@...nel.org>, Alexey Dobriyan <adobriyan@...il.com>, Al Viro <viro@...IV.linux.org.uk>, Pavel Emelyanov <xemul@...allels.com> Subject: Re: [patch 2/2] fs, proc: Introduce the /proc/<pid>/map_files/ directory v6 On Thu, Sep 01, 2011 at 03:49:46PM -0700, Andrew Morton wrote: > On Thu, 1 Sep 2011 14:46:34 +0400 > Cyrill Gorcunov <gorcunov@...il.com> wrote: > > > On Wed, Aug 31, 2011 at 03:10:23PM -0700, Andrew Morton wrote: > > > > > > This function would benefit from a code comment. > > > > > > Given that it's pretty generic (indeed there might be open-coded code > > > which already does this elsewhere), perhaps it should be in mm/mmap.c > > > as a kernel-wide utility function. That will add a little overhead to > > > CONFIG_PROC_FS=n builds, which doesn't seem terribly important. > > > > > > > Andrew, here is an attempt to address concerns. Please review. > > Complains are welcome as always! > > Changelog still doesn't explain why /proc/pid/maps is unfixably unsuitable. > Will update. > > > > ... > > > > +static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd) > > +{ > > + unsigned long vm_start, vm_end; > > + struct task_struct *task; > > + const struct cred *cred; > > + struct mm_struct *mm; > > + struct inode *inode; > > + > > + bool exact_vma_exists = false; > > > > Extraneous newline there. ok, thanks ... > > + > > + /* > > + * We need two passes here: > > + * > > + * 1) Collect vmas of mapped files with mmap_sem taken > > + * 2) Release mmap_sem and instantiate entries > > + * > > + * otherwise we get lockdep complained, since filldir() > > + * routine might require mmap_sem taken in might_fault(). > > + */ > > + > > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > > + if (vma->vm_file) > > + nr_files++; > > + } > > + > > + if (nr_files) { > > + mem_size = nr_files * sizeof(*info); > > + if (mem_size <= KMALLOC_MAX_SIZE) > > + info = kmalloc(mem_size, GFP_KERNEL); > > + else > > + info = vmalloc(mem_size); > > This still sucks :( > > A KMALLOC_MAX_SIZE allocation is huuuuuuuuuuuuge! I don't know how big kmalloc_sizes.h said it could be quite a big :( > it is nowadays, but over 100 kbytes. This will frequently send page > reclaim on a berzerk rampage freeing *thousands* of pages (or > relocating pages) until it manages to generate 20 or 30 physically > contiguous free pages. > > Also, vmalloc sucks. The more often we perform vmallocs (with a mix of > differently-sized ones), the more internally fragmented the vmalloc > arena will become. With some workloads we'll run out of > sufficiently-large contiguous free spaces and things will start > failing. This doesn't happen often. Yet. But the more vmalloc() > callsites we add, the more likely and the more frequent it becomes. So > vmalloc is something we should only use as a last resort. > > The most robust implementation here is to allocate a large number of > small objects - one per vma. A list would be a suitable way of > managing them. Actually I though about slab-cache with 64 bytes per object, but it requires more code to push here, that is why I stopped. Still this doesn't justify me. So yes, bad idea. Thanks! > > But do we *really* need to do it in two passes? Avoiding the temporary > storage would involve doing more work under mmap_sem, and a put_filp() > under mmap_sem might be problematic. > In real, for the particular case where we use this /proc/pid/map_files it could be done in one pass without mmap_sem taken (since we use it when task is frozen and we know noone is poking vmas) but the problem is that people might start using it not for c/r but in various different cases when task is pretty running and I wanna give them more-less robust result in ls -l over this directory. Andrew, I'll think some more, probably I'll find a way to drop this two passes requirement. Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists