[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090124181924.d633523c.akpm@linux-foundation.org>
Date: Sat, 24 Jan 2009 18:19:24 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Paul Turner <pjt@...gle.com>
Cc: linux-kernel@...r.kernel.org, adobriyan@...il.com
Subject: Re: Migration of kernel interfaces to seq_files breaks pread()
consumers
On Fri, 16 Jan 2009 23:51:35 -0800 (PST) Paul Turner <pjt@...gle.com> wrote:
>
> (Specifically) Several interfaces under /proc have been migrated to use
> seq_files. This was previously observed to be a problem with VMware's
> reading of /proc/uptime. We're now running into the same problem on
> /proc/<pid>/stat; we have many consumers performing preads on this
> interface which break under new kernels.
>
> Reverting these migrations presents other problems and doesn't scale with
> everyones' pet dependencies over an abi that's been
> broken :(
We changed userspace-visible behaviour and broke real applications.
This is a serious matter. So serious in fact that your report has
languished without reply for a week.
Reverting those changes until we have a suitable reimplementation which
doesn't bust userspace is 100% justifiable.
In which kernel versions is this regression present?
What would a revert look like? Big and ugly or small and simple? Do
the original commits (which were they?) still revert OK?
> Part of the problem in implementing pread in seq_files is that we don't
> know know whether the read was issued by pread(2) or read(2). It's not
> nice to shoehorn this information down the stack. I've attached a
> skeleton patch which shows one way we could push it up (although something
> like a second f_pos would be necessary to make it maintain pread
> semantics against reads).
>
> One advantage of this style of approach is that it doesn't break on
> partial record reads. But it's a little gross at the same time.
>
Yes, that is a bit gross.
Does this patch actually 100% solve the problem, or is it a precursor
to some other fix or what? It's hard to comment sensibly if it's a
partial thing with no sign how it will be used.
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 2fc2980..744094a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -407,6 +407,16 @@ asmlinkage ssize_t sys_pread64(unsigned int fd, char __user *buf,
> ret = -ESPIPE;
> if (file->f_mode & FMODE_PREAD)
> ret = vfs_read(file, buf, count, &pos);
> + else if (file->f_mode & FMODE_SEQ_FILE) {
> + /*
> + * We break the pread semantic and actually make it
> + * seek, this prevents inconsistent record reads across
> + * boundaries.
> + */
> + vfs_llseek(file, pos, SEEK_SET);
> + ret = vfs_read(file, buf, count, &pos);
> + file_pos_write(file, pos);
> + }
Well yes, that's a userspace-visible wrong change too.
> fput_light(file, fput_needed);
> }
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 3f54dbd..f8c5379 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -50,6 +50,8 @@ int seq_open(struct file *file, const struct seq_operations *op)
>
> /* SEQ files support lseek, but not pread/pwrite */
> file->f_mode &= ~(FMODE_PREAD | FMODE_PWRITE);
> + file->f_mode |= FMODE_SEQ_FILE;
> +
> return 0;
> }
> EXPORT_SYMBOL(seq_open);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5f7b912..c3b5916 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -76,6 +76,8 @@ extern int dir_notify_enable;
> behavior for cross-node execution/opening_for_writing of files */
> #define FMODE_EXEC 16
>
> +#define FMODE_SEQ_FILE_PREAD 32
-EWONTCOMPILE, btw.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists