linux-kernel - Re: [PATCH 2/5] revoke: core code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070315173438.efadd514.akpm@linux-foundation.org>
Date:	Thu, 15 Mar 2007 17:34:38 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Pekka J Enberg <penberg@...helsinki.fi>
Cc:	linux-kernel@...r.kernel.org, hch@...radead.org,
	alan@...rguk.ukuu.org.uk
Subject: Re: [PATCH 2/5] revoke: core code

On Sun, 11 Mar 2007 13:30:49 +0200 (EET) Pekka J Enberg <penberg@...helsinki.fi> wrote:

> From: Pekka Enberg <penberg@...helsinki.fi>
> 
> The revokeat(2) and frevoke(2) system calls invalidate open file
> descriptors and shared mappings of an inode. After an successful
> revocation, operations on file descriptors fail with the EBADF or
> ENXIO error code for regular and device files,
> respectively. Attempting to read from or write to a revoked mapping
> causes SIGBUS.
> 
> The actual operation is done in two passes:
> 
>  1. Revoke all file descriptors that point to the given inode. We do
>     this under tasklist_lock so that after this pass, we don't need
>     to worry about racing with close(2) or dup(2).
>    
>  2. Take down shared memory mappings of the inode and close all file
>     pointers.
> 
> The file descriptors and memory mapping ranges are preserved until the
> owning task does close(2) and munmap(2), respectively.
> 
> ...
>
> +asmlinkage int sys_revokeat(int dfd, const char __user *filename);
> +asmlinkage int sys_frevoke(unsigned int fd);

noooo.... all system calls must return long.

> +static int revoke_vma(struct vm_area_struct *vma, struct zap_details *details)
> +{
> +	unsigned long restart_addr, start_addr, end_addr;
> +	int need_break;
> +
> +	start_addr = vma->vm_start;
> +	end_addr = vma->vm_end;
> +
> +	/*
> + 	 * Not holding ->mmap_sem here.
> +	 */
> +	vma->vm_flags |= VM_REVOKED;

so....  the modification of vm_flags is racy?

> +	smp_mb();

Please always document barriers.  There's presumably some vm_flags reader
we're concerned about here, but how is the code reader to know what the
code writer was thinking?


> +  again:
> +	restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr,
> +				      details);
> +
> +	need_break = need_resched() || need_lockbreak(details->i_mmap_lock);
> +	if (need_break)
> +		goto out_need_break;
> +
> +	if (restart_addr < end_addr) {
> +		start_addr = restart_addr;
> +		goto again;
> +	}
> +	return 0;
> +
> +  out_need_break:
> +	spin_unlock(details->i_mmap_lock);
> +	cond_resched();
> +	spin_lock(details->i_mmap_lock);
> +	return -EINTR;
> +}
> +
> +static int revoke_mapping(struct address_space *mapping, struct file *to_exclude)
> +{
> +	struct vm_area_struct *vma;
> +	struct prio_tree_iter iter;
> +	struct zap_details details;
> +	int err = 0;
> +
> +	details.i_mmap_lock = &mapping->i_mmap_lock;
> +
> +	spin_lock(&mapping->i_mmap_lock);
> +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX) {
> +		if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) {
> +			err = revoke_vma(vma, &details);
> +			if (err)
> +				goto out;
> +		}
> +	}
> +
> +	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list) {
> +		if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) {
> +			err = revoke_vma(vma, &details);
> +			if (err)
> +				goto out;
> +		}
> +	}
> +  out:
> +	spin_unlock(&mapping->i_mmap_lock);
> +	return err;
> +}

This all looks very strange.  If the calling process expires its timeslice,
the entire system call fails?

What's happening here?


> +
> +int generic_file_revoke(struct file *file)
> +{
> +	int err;
> +
> +	/*
> +	 * Flush pending writes.
> +	 */
> +	err = do_fsync(file, 1);
> +	if (err)
> +		goto out;
> +
> +	/*
> +	 * Make pending reads fail.
> +	 */
> +	err = invalidate_inode_pages2(file->f_mapping);
> +
> +  out:
> +	return err;
> +}
> +
> +EXPORT_SYMBOL(generic_file_revoke);

do_fsync() is seriously suboptimal - it will run an ext3 commit. 
do_sync_file_range(...,
SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
will run maybe five times quicker.

But otoh, do_sync_file_range() will fail to write back the pages for a
data=journal ext3 file, I expect (oops).


Why is this code using invalidate_inode_pages2()?  That function keeps on
breaking, has ill-defined semantics and will probably change in the future.

Exactly what semantics are you looking for here, and why?

The blank line before the EXPORT_SYMBOL() is a waste of space.

> +/*
> + *	Filesystem for revoked files.
> + */
> +
> +static struct inode *revokefs_alloc_inode(struct super_block *sb)
> +{
> +	struct revokefs_inode_info *info;
> +
> +	info = kmem_cache_alloc(revokefs_inode_cache, GFP_NOFS);
> +	if (!info)
> +		return NULL;
> +
> +	return &info->vfs_inode;
> +}

Why GFP_NOFS?

> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ uml-2.6/include/linux/revoked_fs_i.h	2007-03-11 13:09:20.000000000 +0200
> @@ -0,0 +1,20 @@
> +#ifndef _LINUX_REVOKED_FS_I_H
> +#define _LINUX_REVOKED_FS_I_H
> +
> +#define REVOKEFS_MAGIC 0x5245564B      /* REVK */

This is supposed to go into magic.h.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/