[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d3960a6-34fc-1951-b39b-ce41674bb4d0@yandex-team.ru>
Date: Tue, 15 Aug 2023 13:25:42 +0500
From: Valentin Sinitsyn <valesini@...dex-team.ru>
To: Dan Williams <dan.j.williams@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Tejun Heo <tj@...nel.org>
Cc: Daniel Vetter <daniel.vetter@...ll.ch>,
Bjorn Helgaas <bhelgaas@...gle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kernfs: implement custom llseek method to fix userspace
regression
On 15.08.2023 01:01, Dan Williams wrote:
> Valentine Sinitsyn wrote:
>> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"),
>> mmapable sysfs binary attributes have started receiving their
>> f_mapping from the iomem pseudo filesystem, so that
>> CONFIG_IO_STRICT_DEVMEM is honored in sysfs (and procfs) as well
>> as in /dev/[k]mem.
>>
>> This resulted in a userspace-visible regression: lseek(fd, 0, SEEK_END)
>> now returns zero regardless the real sysfs attribute size which stat()
>> reports. The reason is that kernfs files use generic_file_llseek()
>> implementation, which relies on f_mapping->host inode to get the file
>> size. As f_mapping is now redefined, f_mapping->host points to an
>> anonymous zero-sized iomem inode which has nothing to do with sysfs
>> attribute or kernfs file representing it. This being said, f_inode
>> remains valid, so stat() which uses it works correctly.
>
> Can you say a bit more about what userspace scenario regressed so that
> others doing backports can make a judgement call on the severity?
We've encountered this regression in the code which used lseek() to
determine the size of PCI region. It was roughly equivalent to:
#define SYSFS_DEVICE_DIR "/sys/bus/pci/devices/<some id>/"
int fd = open(SYSFS_DEVICE_DIR "/resource0", O_RDWR);
off_t size = lseek(fd, 0, SEEK_END);
assert(size != 0)
Calling lseek() with whence argument other than SEEK_END and non-zero
offset on this fd returns an error as the kernel considers it as seeking
past EOF.
I'll add this explanation to v2 commit message.
>
>>
>> Fixes the regression by implementing a custom llseek fop for kernfs,
>> which uses an attribute's file inode to get the file size,
>> just as stat() does.
>>
>> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Valentine Sinitsyn <valesini@...dex-team.ru>
>> ---
>> fs/kernfs/file.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
>> index 180906c36f51..6d81e0c981f3 100644
>> --- a/fs/kernfs/file.c
>> +++ b/fs/kernfs/file.c
>> @@ -903,6 +903,21 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
>> return ret;
>> }
>>
>> +static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
>> +{
>> + /*
>> + * This is almost identical to generic_file_llseek() except it uses
>> + * cached inode value instead of f_mapping->host.
>> + * The reason is that, for PCI resources in sysfs the latter points to
>> + * iomem_inode whose size has nothing to do with the attribute's size.
>> + */
>> + struct inode *inode = file_inode(file);
>
> My only concern is whether there are any scenarios where this is not
> appropriate. I.e. do a bit more work to define a kernfs_ops instance
> specifically for overriding lseek() in this scenario.
Not sure I'm getting you here: do you mean something like this?
struct inode *inode = is_f_mapping_redefined(file) ? file_inode(file) :
file->f_mapping->host;
My understanding is file->f_inode should always be non-NULL and point to
the inode corresponding the path of the opened file, so it should be
safe to call regardless what f_mapping->host is. Do I miss anything?
Best,
Valentin
>
>> +
>> + return generic_file_llseek_size(file, offset, whence,
>> + inode->i_sb->s_maxbytes,
>> + i_size_read(inode));
>> +}
>> +
>> static void kernfs_notify_workfn(struct work_struct *work)
>> {
>> struct kernfs_node *kn;
>> @@ -1005,7 +1020,7 @@ EXPORT_SYMBOL_GPL(kernfs_notify);
>> const struct file_operations kernfs_file_fops = {
>> .read_iter = kernfs_fop_read_iter,
>> .write_iter = kernfs_fop_write_iter,
>> - .llseek = generic_file_llseek,
>> + .llseek = kernfs_fop_llseek,
>> .mmap = kernfs_fop_mmap,
>> .open = kernfs_fop_open,
>> .release = kernfs_fop_release,
>> --
>> 2.34.1
>>
>
>
Powered by blists - more mailing lists