lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Feb 2020 08:58:47 +0100
From:   Michael Stapelberg <michael+lkml@...pelberg.ch>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     fuse-devel <fuse-devel@...ts.sourceforge.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kyle Sanderson <kyle.leet@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Still a pretty bad time on 5.4.6 with fuse_request_end.

Sure: here’s a backtrace and req->args printed:

(gdb) bt full
#0  0xffffffff82000ff3 in __x86_indirect_thunk_r8 () at
arch/x86/lib/retpoline.S:40
No locals.
#1  0xffffffff8137ec68 in fuse_request_end (fc=0xffff88813a57be00,
req=0xffff88813a5a3770) at fs/fuse/dev.c:329
        fiq = 0xffff88813a57be48
        async = true
#2  0xffffffff81382d88 in fuse_dev_do_write (fud=0xffff88813a57be00,
cs=0xffffc90002fafa00, nbytes=4294967294) at fs/fuse/dev.c:1912
        err = 0
        fc = 0xffff88813a57be00
        fpq = 0xffff888132815f48
        req = 0xffff88813a5a3770
        oh = {
          len = 16,
          error = -2,
          unique = 66942
        }
#3  0xffffffff81382e69 in fuse_dev_write (iocb=0xffffc900008cbe48,
from=0xffffc900008cbe20) at fs/fuse/dev.c:1934
        cs = {
          write = 0,
          req = 0xffff88813a5a3770,
          iter = 0xffffc900008cbe20,
          pipebufs = 0x0 <fixed_percpu_data>,
          currbuf = 0x0 <fixed_percpu_data>,
          pipe = 0x0 <fixed_percpu_data>,
          nr_segs = 0,
          pg = 0x0 <fixed_percpu_data>,
          len = 0,
          offset = 24,
          move_pages = 0
        }
        fud = 0xffff888132815f40
#4  0xffffffff8120122e in call_write_iter (file=<optimized out>,
iter=<optimized out>, kio=<optimized out>) at
./include/linux/fs.h:1901
No locals.
#5  new_sync_write (filp=0xffff888119886b00, buf=<optimized out>,
len=<optimized out>, ppos=0xffffc900008cbee8) at fs/read_write.c:483
        iov = {
          iov_base = 0xc0008ec008,
          iov_len = 16
        }
        kiocb = {
          ki_filp = 0xffff888119886b00,
          ki_pos = 0,
          ki_complete = 0x0 <fixed_percpu_data>,
          private = 0x0 <fixed_percpu_data>,
          ki_flags = 0,
          ki_hint = 0,
          ki_ioprio = 0,
          ki_cookie = 0
        }
        iter = {
          type = 5,
          iov_offset = 0,
          count = 0,
          {
            iov = 0xffffc900008cbe20,
            kvec = 0xffffc900008cbe20,
            bvec = 0xffffc900008cbe20,
            pipe = 0xffffc900008cbe20
          },
          {
            nr_segs = 0,
            {
              head = 0,
              start_head = 0
            }
          }
        }
        ret = <optimized out>
#6  0xffffffff812012e4 in __vfs_write (file=<optimized out>,
p=<optimized out>, count=<optimized out>, pos=<optimized out>) at
fs/read_write.c:496
No locals.
#7  0xffffffff81203f04 in vfs_write (pos=<optimized out>, count=16,
buf=<optimized out>, file=<optimized out>) at fs/read_write.c:558
        ret = 16
        ret = <optimized out>
#8  vfs_write (file=0xffff888119886b00, buf=0xc0008ec008 "\020",
count=16, pos=0xffffc900008cbee8) at fs/read_write.c:542
        ret = 16
#9  0xffffffff812041b2 in ksys_write (fd=<optimized out>,
buf=0xc0008ec008 "\020", count=16) at fs/read_write.c:611
        pos = 0
        ppos = <optimized out>
        f = <optimized out>
        ret = 824643076104
#10 0xffffffff81204245 in __do_sys_write (count=<optimized out>,
buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:623
No locals.
#11 __se_sys_write (count=<optimized out>, buf=<optimized out>,
fd=<optimized out>) at fs/read_write.c:620
        ret = <optimized out>
        ret = <optimized out>
#12 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:620
No locals.
#13 0xffffffff810028a8 in do_syscall_64 (nr=<optimized out>,
regs=0xffffc900008cbf58) at arch/x86/entry/common.c:294
        ti = <optimized out>
#14 0xffffffff81e0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
No locals.
#15 0x0000000000000000 in ?? ()
No symbol table info available.

(gdb) p *req->args
$5 = {
  nodeid = 18446683600620026424,
  opcode = 2167928246,
  in_numargs = 65535,
  out_numargs = 65535,
  force = false,
  noreply = false,
  nocreds = false,
  in_pages = false,
  out_pages = false,
  out_argvar = true,
  page_zeroing = true,
  page_replace = false,
  in_args = {{
      size = 978828800,
      value = 0x2fafce0
    }, {
      size = 978992728,
      value = 0xffffffff8138efaa <fuse_alloc_forget+26>
    }, {
      size = 50002688,
      value = 0xffffffff8138635f <fuse_lookup_name+255>
    }},
  out_args = {{
      size = 570,
      value = 0xffffc90002fafb10
    }, {
      size = 6876,
      value = 0x3000000001adc
    }},
  end = 0x1000100000001
}

Independently, as a separate test, I have also modified the source like this:

bool async;
bool async_early = req->args->end;

if (test_and_set_bit(FR_FINISHED, &req->flags))
goto put_request;

async = req->args->end;

…and printed the value of async and async_early. async is true,
async_early is false.

Perhaps some other routine is modifying the request, and checking
req->args->end early enough masks that bug?

On Tue, Feb 11, 2020 at 11:55 AM Miklos Szeredi <miklos@...redi.hu> wrote:
>
> On Sun, Feb 9, 2020 at 9:09 AM <michael+lkml@...pelberg.ch> wrote:
> >
> > From: Michael Stapelberg <michael+lkml@...pelberg.ch>
> >
> > Hey,
> >
> > I recently ran into this, too. The symptom for me is that processes using the
> > affected FUSE file system hang indefinitely, sync(2) system calls hang
> > indefinitely, and even triggering an abort via echo 1 >
> > /sys/fs/fuse/connections/*/abort does not get the file system unstuck (there is
> > always 1 request still pending). Only removing power will get the machine
> > unstuck.
> >
> > I’m triggering this when building packages for https://distr1.org/, which uses a
> > FUSE daemon (written in Go using the jacobsa/fuse package) to provide package
> > contents.
> >
> > I bisected the issue to commit
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2b319d1f6f92a4ced9897678113d176ee16ae85d
> >
> > With that commit, I run into a kernel oops within ≈1 minute after starting my
> > batch build. With the commit before, I can batch build for many minutes without
> > issues.
>
> Pretty weird.   I'm not seeing how this could change behavior, as the
> args->end value is not changed after being initialized, and so moving
> the test later should not make a difference.
>
> Could you print out the complete contents of req->args?
>
> Thanks,
> Miklos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ