linux-kernel - [Query] Preemption (hogging) of the work handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160701165959.GR12473@ubuntu>
Date:	Fri, 1 Jul 2016 09:59:59 -0700
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	vlevenetz@...sol.com, vaibhav.hiremath@...aro.org,
	alex.elder@...aro.org, johan@...nel.org
Subject: [Query] Preemption (hogging) of the work handler

Hi Tejun,

we are stuck with a typical issue on our octa-core ARM platform and
wanted to make sure that we aren't abusing the workqueue API by using
it for the wrong usecase.

Setup:

The system watchdog uses a delayed-work (1 second) for petting the
watchdog (resetting its counter) and if the work doesn't reset the
counters in time (another 1 second), the watchdog resets the system.

Petting-time: 1 second
Watchdog Reset-time: 2 seconds

The wq is allocated with:
        wdog_wq = alloc_workqueue("wdog", WQ_HIGHPRI, 0);

The watchdog's work-handler looks like this:

static void pet_watchdog_work(struct work_struct *work)
{

        ...

        pet_watchdog(); //Reset its counters

        /* CONFIG_HZ=300, queuing for 1 second */
        queue_delayed_work(wdog_wq, &wdog_dwork, 300);
}

kernel: 3.10 (Yeah, you can rant me for that, but its not something I
can decide on :)

Symptoms:

- The watchdog reboots the system sometimes. It is more reproducible
  in cases where an (out-of-tree) bus enumerated over USB is suddenly
  disconnected, which leads to removal of lots of kernel devices on
  that bus and a lot of print messages as well, due to failures for
  sending any more data for those devices..

Observations:

I tried to get more into it and found this..

- The timer used by the delayed work fires at the time it was
  programmed for (checked timer->expires with value of jiffies) and
  the work-handler gets a chance to run and reset the counters pretty
  quickly after that.

- But somehow, the timer isn't programmed for the right time.

- Something is happening between the time the work-handler starts
  running and we read jiffies from the add_timer() function which gets
  called from within the queue_delayed_work().

- For example, if the value of jiffies in the pet_watchdog_work()
  handler (before calling queue_delayed_work()) is say 1000000, then
  the value of jiffies after the call to queue_delayed_work() has
  returned becomes 1000310. i.e. it sometimes increases by a value of
  over 300, which is 1 second in our setup. I have seen this delta to
  vary from 50 to 350. If it crosses 300, the watchdog resets the
  system (as it was programmed for 2 seconds).


So, we aren't able to queue the next timer in time and that causes all
these problems. I haven't concluded on why is that so..

Questions:

- I hope that the wq handler can be preempted, but can it be this bad?
- Is it fine to use the wq-handler for petting the watchdog? Or should
  that only be done with help of interrupt-handlers?
- Any other clues you can give which can help us figure out what's
  going on?

Thanks in advance and sorry to bother you :)

-- 
viresh