lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Mar 2019 17:58:28 +0100
From:   Juergen Gross <>
To:     Kirill Smelkov <>,
        Al Viro <>,
        Linus Torvalds <>
        Michael Kerrisk <>,
        Yongzhi Pan <>,
        Jonathan Corbet <>,
        David Vrabel <>,
        Miklos Szeredi <>, Tejun Heo <>,
        Kirill Tkhai <>,
        Arnd Bergmann <>, Christoph Hellwig <>,
        Greg Kroah-Hartman <>,
        Julia Lawall <>,
        Nikolaus Rath <>,
        Han-Wen Nienhuys <>
Subject: Re: [PATCH 1/3] fs: stream_open - opener for stream-like files so
 that read and write can run simultaneously without deadlock

On 26/03/2019 23:20, Kirill Smelkov wrote:
> Commit 9c225f2655 (vfs: atomic f_pos accesses as per POSIX) added locking for
> file.f_pos access and in particular made concurrent read and write not possible
> - now both those functions take f_pos lock for the whole run, and so if e.g. a
> read is blocked waiting for data, write will deadlock waiting for that read to
> complete. This caused regression for stream-like files where previously read
> and write could run simultaneously, but after that patch could not do so
> anymore. See e.g. 581d21a2d0 (xenbus: fix deadlock on writes to /proc/xen/xenbus)
> which fixes such regression for particular case of /proc/xen/xenbus.
> The patch that added f_pos lock in 2014 (see
> for background discussion) did so to guarantee POSIX thread safety for
> read/write/lseek and added the locking to file descriptors of all regular
> files. In 2014 that thread-safety problem was not new as it was already discussed
> earlier in 2006: However even though 2006'th
> version of Linus's patch ( was adding f_pos
> locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding
> the stream-like objects like pipes and sockets)", 2014'th version - the one that
> actually made it into the tree as 9c225f2655 - is doing so irregardless of whether
> a file is seekable or not. The reason that it did so is, probably, that there are
> many files that are marked non-seekable, but e.g. their read implementation
> actually depends on knowing current position to correctly handle the read. Some
> examples:
> 	kernel/power/user.c		snapshot_read
> 	fs/debugfs/file.c		u32_array_read
> 	fs/fuse/control.c		fuse_conn_waiting_read + ...
> 	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
> 	arch/s390/hypfs/inode.c		hypfs_read_iter
> 	...
> In despite that, many nonseekable_open users implement read and write with pure
> stream semantics - they don't depend on passed ppos at all. And for those cases
> where read could wait for something inside, it creates a situation similar to
> xenbus - the write could be never made to go until read is done, and read is
> waiting for some, potentially external, event, for potentially unbounded time
> -> deadlock. Besides xenbus, there are 14 such places in the kernel that I've
> found with semantic patch (see below):
> 	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
> 	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
> 	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
> 	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
> 	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
> 	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
> 	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
> 	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
> 	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
> 	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
> 	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
> 	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
> 	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
> 	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
> In addition to the cases above another regression caused by f_pos locking is
> that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can
> no longer implement bidirectional stream-like files - for the same reason
> as above e.g. read can deadlock write locking on file.f_pos in the kernel.
> FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f7 (fuse: implement
> nonseekable open) to support OSSPD (;
> OSSPD implements /dev/dsp in userspace with
> FOPEN_NONSEEKABLE flag, with corresponding read and write routines not
> depending on current position at all, and with both read and write being
> potentially blocking operations:
> Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
> "somewhat pipe-like files ..." with read handler not using offset. However
> that test implements only read without write and cannot exercise the deadlock
> scenario:
> I've actually hit the read vs write deadlock for real while implementing my
> FUSE filesystem where there is /head/watch file, for which open creates
> separate bidirectional socket-like stream in between filesystem and its user
> with both read and write being later performed simultaneously. And there it is
> semantically not easy to split the stream into two separate read-only and
> write-only channels:
> Let's fix this regression. The plan is:
> 1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would
>    break many in-kernel nonseekable_open users which actually use ppos in
>    read/write handlers.
> 2. Add stream_open() to kernel to open stream-like non-seekable file descriptors.
>    Read and write on such file descriptors would never use nor change ppos. And
>    with that property on stream-like files read and write will be running without
>    taking f_pos lock - i.e. read and write could be running simultaneously.
> 3. With semantic patch search and convert to stream_open all in-kernel
>    nonseekable_open users for which read and write actually do not depend on ppos and
>    where there is no other methods in file_operations which assume @offset access.
> 4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open
>    if that bit is present in filesystem open reply.
>    It was tempting to change fs/fuse/ open handler to use stream_open instead of
>    nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian
>    codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually
>    uses offset in its read and write handlers
>    so if we would do such a change it will break a real user.
> 5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from
>    v3.14+ (the kernel where 9c225f2655 first appeared). This will allow to patch
>    OSSPD and other FUSE filesystems that provide stream-like files to return
>    FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on
>    all kernel versions. This should work because fs/fuse/ ignores unknown open
>    flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that
>    is not aware of this flag cannot hurt. In turn the kernel that is not aware of
>    FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to
>    implement streams without read vs write deadlock.
> This patch: adds stream_open, converts /proc/xen/xenbus to it and adds semantic
> patch to automatically locate in-kernel places that are either required to be
> converted due to read vs write deadlock, or that are just safe to be converted
> because read and write do not use ppos and there are no other funky methods in
> file_operations.
> Followup patches are:
> - apply the result of semantic patch;
> - add FOPEN_STREAM to fs/fuse.
> Regarding semantic patch I've verified each generated change manually - that it is
> correct to convert - and each other nonseekable_open instance left - that it is
> either not correct to convert there, or that it is not converted due to current
> stream_open.cocci limitations. The script also does not convert files that should
> be valid to convert, but that currently have .llseek = noop_llseek or
> generic_file_llseek for unknown reason despite file being opened with
> nonseekable_open (e.g. drivers/input/mousedev.c)
> Cc: Michael Kerrisk <>
> Cc: Yongzhi Pan <>
> Cc: Jonathan Corbet <>
> Cc: David Vrabel <>
> Cc: Juergen Gross <>
> Cc: Miklos Szeredi <>
> Cc: Tejun Heo <>
> Cc: Kirill Tkhai <>
> Cc: Arnd Bergmann <>
> Cc: Christoph Hellwig <>
> Cc: Greg Kroah-Hartman <>
> Cc: Julia Lawall <>
> Cc: Nikolaus Rath <>
> Cc: Han-Wen Nienhuys <>
> Signed-off-by: Kirill Smelkov <>

For the Xen changes: Reviewed-by: Juergen Gross <>


Powered by blists - more mailing lists