lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Oct 2016 17:10:56 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Chris Mason <clm@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jens Axboe <axboe@...com>,
        Dave Jones <davej@...emonkey.org.uk>,
        Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
        David Sterba <dsterba@...e.com>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Andrew Lutomirski <luto@...nel.org>
Subject: Re: bio linked list corruption.

On Tue, Oct 18, 2016 at 4:42 PM, Chris Mason <clm@...com> wrote:
>
> Seems to be the whole thing:

Ahh. On lkml, so I do have it in my mailbox, but Dave changed the
subject line when he tested on ext4 rather than btrfs..

Anyway, the corrupted address is somewhat interesting. As Dave Jones
said, he saw

  list_add corruption. prev->next should be next (ffffe8ffff806648),
but was ffffc9000067fcd8. (prev=ffff880503878b80).
  list_add corruption. prev->next should be next (ffffe8ffffc05648),
but was ffffc9000028bcd8. (prev=ffff880503a145c0).

and Dave Chinner reports

  list_add corruption. prev->next should be next (ffffe8ffffc02808),
but was ffffc90005f6bda8. (prev=ffff88013363bb80).

and it's worth noting that the "but was" is a remarkably consistent
vmalloc address (the ffffc9000.. pattern gives it away). In fact, it's
identical across two boots for DaveJ in the low 14 bits, and fairly
high up in those low 14 bots (0x3cd8).

DaveC has a different address, but it's also in the vmalloc space, and
also looks like it is fairly high up in 14 bits (0x3da8). So in both
cases it's almost certainly a stack address with a fairly empty stack.
The differences are presumably due to different kernel configurations
and/or just different filesystems calling the same function that does
the same bad thing but now at different depths in the stack.

Adding Andy to the cc, because this *might* be triggered by the
vmalloc stack code itself. Maybe the re-use of stacks showing some
problem? Maybe Chris (who can't see the problem) doesn't have
CONFIG_VMAP_STACK enabled?

Andy - this is on lkml, under

Dave Chinner:
  [regression, 4.9-rc1] blk-mq: list corruption in request queue

Dave Jones:
  btrfs bio linked list corruption.
  Re: bio linked list corruption.

and they are definitely the same thing across three different
filesystems (xfs, btrfs and ext4), and they are consistent enough that
there is almost certainly a single very specific memory corrupting
issue that overwrites something with a stack pointer.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ