linux-kernel - "impossible" spinlock "wrong CPU" problem with custom device driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4A55222E.5030405@easycrypt.de>
Date:	Thu, 09 Jul 2009 00:48:14 +0200
From:	Timm Korte <korte-kernel@...ycrypt.de>
To:	lkml <linux-kernel@...r.kernel.org>
Subject: "impossible" spinlock "wrong CPU" problem with custom device driver

I'm trying to understand a spinlog bug in a kernel module (device driver).
I have a spinlock that is uses in the actual hardware interrupt handler
as well as in a seperate kernel thread doing the real work via a work
queue. The first one uses the spinlock with spin_lock() and
spin_unlock(), while the thread uses spin_lock_irqsave() and
spin_unlock_irqrestore().
On rare occasions (can't reproduce on purpose), i get a spinlog debug
message about wrong cpu on _raw_spin_unlock when called from the kernel
thread.

This is the source (for the kernel_thread) that runs into the problem:

static int my_irqthread_function(void *ptr) {
  struct my_dev *mydev = ptr;

  daemonize(MY_NAME "%02x", mydev->mynum);
  allow_signal(SIGTERM);
  while (!wait_event_interruptible(mydev->irqthread_wait,
atomic_read(&mydev->irqthread_pending_count))) {
    do {
      uint8_t my_irq_pending = 0;
      unsigned long iflags;

      spin_lock_irqsave(&mydev->irq_pending_lock, iflags);
      my_irq_pending = mydev->irq_pending;
      mydev->irq_pending = 0;
      spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);

      // handle irqs
      if (my_irq_pending & INT_IPAC1) {
         my_handle_interrupt(&mydev->mydev[IPAC1]);
      }
...
      // continue if the pending count still is != 0 after decrementing
    } while (!atomic_dec_and_test(&mydev->irqthread_pending_count));
  }

  mydev->irqthread = 0;
  complete_and_exit(&mydev->irqthread_exit, 0);
}

The error (SPIN_BUG with kernel panic on my SMP box) happens on the
"spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i
really can't figure out, how the thread could be moved to another cpu,
while holding the lock and only doing two assignment operations.

The only thing i could think of, is that it might have something to do
with the enabled sigterm signal - even though the module wasn't being
unloaded at the time the bug occured.

System is FC4 based with a 2.6.17 kernel (can't change).

So I'm sort of out of ideas and hope someone here has an idea, what
might have gone wrong here.

Timm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/