lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Jun 2017 09:54:32 +0800
From:   Orson Zhai <orson.zhai@...eadtrum.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC:     Zhongping Tan (谭中平) 
        <Zhongping.Tan@...eadtrum.com>, Arnd Bergmann <arnd@...db.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

Hi Greg & Arnd,

On Tue, Jun 27, 2017 at 08:29:17AM +0200, Greg Kroah-Hartman wrote:
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top

We are so sorry for any troubles to you. I will take the role of Zhongping to
continue discussion here.

> 
> On Tue, Jun 27, 2017 at 02:02:13AM +0000, Zhongping Tan (谭中平) wrote:
> > Ok, firstly we need to discuss the list usage, for list head we need
> > do initialization, but for list node we don't need initialization at
> > all.
> 
> I don't understand, why is your misc driver touching that field at all?
> Do I need to go and make it "private" somehow?

There maybe some mis-understanding caused by not very well english expression.
I'll clarify what had happended to us.

> 
> > And for misc_list head, we use LIST_HEAD to define and initialize
> > it. So I don't know why we put INIT_LIST_HEAD(&misc->list) in function
> > misc_register, any bugs when without it?
> 
> Again, what is wrong with the code today?  What driver is this causing
> problems for?

It maybe a little bit long story.

In recent, our test team report a crash of non-stop reboot test on a mobile phone after 46 hours.

After some investigation, we got the information as following,

&misc_list = 0xFFFFFFFF822AC780 -> (
    next_=_0xFFFFFFFFA0087158 -> (
      next = 0xFFFFFFFFA0087158 -> (
        next = 0xFFFFFFFFA0087158,
        prev = 0xFFFFFFFFA0087158 -> (
          next = 0xFFFFFFFFA0087158,
          prev = 0xFFFFFFFFA0087158)),
      prev = 0xFFFFFFFFA0087158 -> (
        next = 0xFFFFFFFFA0087158,
        prev = 0xFFFFFFFFA0087158)),
    prev = 0xFFFF88007BF55618)

it seems that misc_list fall into a loop which cause surfaceflinger (a service from Android) dead.

And we got futher more information after that, 

  (struct miscdevice*)0xFFFFFFFFA0087140 = 0xFFFFFFFFA0087140 -> (
    minor = 20,
    name = 0xFFFFFFFFA008606D -> "fm",
    fops = 0xFFFFFFFFA0086700 -> (
      owner = 0xFFFFFFFFA0087480,
      llseek = 0x0,
      read = 0xFFFFFFFFA0081860,
      write = 0x0,
      read_iter = 0x0,
      write_iter = 0x0,
      iterate = 0x0,
      poll = 0x0,
      unlocked_ioctl = 0xFFFFFFFFA00832C0,
      compat_ioctl = 0xFFFFFFFFA0083870,
      mmap = 0x0,
      open = 0xFFFFFFFFA0081B80,
      flush = 0x0,
      release = 0xFFFFFFFFA00838C0,
      fsync = 0x0,
      aio_fsync = 0x0,
      fasync = 0x0,
      lock = 0x0,
      sendpage = 0x0,
      get_unmapped_area = 0x0,
      check_flags = 0x0,
      flock = 0x0,
      splice_write = 0x0,
      splice_read = 0x0,
      setlease = 0x0,
      fallocate = 0x0,
      show_fdinfo = 0x0),
    list = (
      next = 0xFFFFFFFFA0087158,
      prev = 0xFFFFFFFFA0087158),
    parent = 0x0,
    this_device = 0xFFFF88007BD32000,
    groups = 0x0,
    nodename = 0x0,
    mode = 0) 

We found the device is "fm". We highly suspect that fm driver call misc_register twice and reinitialize list to make ->pre & ->next pointing to himself.

Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
Consider that this is a crash after 46 hours continuous power-on/off, it maybe caused by some special cases we are hard to know for now.
We think it might make some sence to add protection code into misc_register() at first.

Hope this could help to understand our situation.

We'll try to provide any detail inforamtion about this if necessary.

Thanks,
Orson

> 
> thanks,
> 
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ