lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190426074522.GA16247@deco.navytux.spb.ru>
Date:   Fri, 26 Apr 2019 07:45:33 +0000
From:   Kirill Smelkov <kirr@...edi.com>
To:     David Laight <David.Laight@...LAB.COM>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Sasha Levin <sashal@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        stable <stable@...r.kernel.org>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Yongzhi Pan <panyongzhi@...il.com>,
        Jonathan Corbet <corbet@....net>,
        David Vrabel <david.vrabel@...rix.com>,
        Juergen Gross <jgross@...e.com>,
        Miklos Szeredi <miklos@...redi.hu>, Tejun Heo <tj@...nel.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Arnd Bergmann <arnd@...db.de>, Christoph Hellwig <hch@....de>,
        Julia Lawall <Julia.Lawall@...6.fr>,
        Nikolaus Rath <Nikolaus@...h.org>,
        Han-Wen Nienhuys <hanwen@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock

On Thu, Apr 25, 2019 at 10:04:34AM +0000, David Laight wrote:
> From: Kirill Smelkov
> > Sent: 24 April 2019 19:30
> > 
> > On Wed, Apr 24, 2019 at 10:26:55AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 24, 2019 at 10:19 AM Sasha Levin <sashal@...nel.org> wrote:
> > > >
> > > > Hm, I might be confusing something here but I see a bunch of patches
> > > > that convert existing callers mentioned in this patch to use
> > > > stream_open() which was introduced here.
> > >
> > > The only use of stream_open() upstream right now is the xenbus
> > > conversion, and that isn't actually a bugfix, because xenbus used to
> > > manually do that
> > >
> > >         filp->f_mode &= ~FMODE_ATOMIC_POS; /* cdev-style semantics */
> > >
> > > that stream_open() does.
> > >
> > > So no, there isn't "a bunch of patches" anywhere.
> > >
> > > There are *future* cleanups for 5.2 that will happen, and that might
> > > have hit linux-next. And there is at least one FUSE patch (again -
> > > pending, not upstream) that may get marked for stable.
> > >
> > > But I see nothing right now that makes it stable material yet.
> > 
> > Linus, thanks for explaining. Sasha, Greg, there is a FUSE patch that
> > should be stable material that will need this stream_open() thing. That
> > patch has just entered fuse.git#for-next today:
> > 
> > 	https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/commit/?id=bbd84f33652f
> > 
> > and will hopefully enter 5.2 when merge window opens. I agree we should
> > not blindly backport bulk stream_open conversions as performed by
> > stream_open.cocci, at least unless there is a bug report indicating that
> > it is actually required for a particular driver. On the other hand both
> > Xen and FUSE deadlocks were hit for real which justifies stable
> > propagation for their fixes.
> > 
> > You can read about the deadlock regression and the plan to fix it in
> > original "fs: stream_open - opener for stream-like files so that read
> > and write can run simultaneously without deadlock" patch (the 59/66
> > patch that was send in this thread), or here:
> > 
> > 	https://git.kernel.org/linus/10dce8af3422
> > 
> > 
> > Hope it clarifies things a bit,
> 
> I can also imagine drivers that expect accesses to be done using
> pread() and pwrite() - maybe only if the fd is shared.
> Provided accesses get the correct offset they can be concurrent.
> In fact they only need to update the offset in the file structure
> when they complete - they may do this already.
> 
> I know (I think) uclibc implementing pread() as lseek() + read()
> caused me grief - but that might just have been the extra system
> call overhead rather than any problems with the offset.

I'm not sure I understand your comment completely, but we convert to
stream_open only drivers that actually do _not_ use position at all, and
that were already using nonseekable_open, thus pread and pwrite were
already returning -ESPIPE for them (nonseekable_open clears
FMODE_{PREAD,PWRITE} and ksys_{pread,pwrite}64 check for that flag). We
also convert only drivers that use no_llseek for .llseek, so lseek
on those files is/was always returning -ESPIPE as well.

If a driver uses position in its read and write and has support for
pread/pwrite (FMODE_PREAD and FMODE_PWRITE), pread and pwrite are
already working _without_ file->f_pos locking - because those system
calls do not semantically update file->f_pos at all and thus do not take
file->f_pos_lock - i.e. pread/pwrite can be run simultaneously already.

If libc implements pread as lseek+read it will work for a single
user case  (single thread, or fd not shared between processes), but it
will break because of lseek+read non-atomicity if multiple preads are
simultaneously used from several threads. And also for such emulation
for multiple users case there is a chance for pread vs pwrite deadlock,
since those system calls are using read and write and read and write
take file->f_pos_lock.

Kirill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ