linux-kernel - Re: [PATCH] softlockup: remove hung_task_check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c62985530901210514o75a8ec67y1d48eedc996ee67b@mail.gmail.com>
Date:	Wed, 21 Jan 2009 14:14:23 +0100
From:	Frédéric Weisbecker <fweisbec@...il.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Mandeep Singh Baines <msb@...gle.com>,
	linux-kernel@...r.kernel.org, rientjes@...gle.com,
	mbligh@...gle.com, thockin@...gle.com
Subject: Re: [PATCH] softlockup: remove hung_task_check_count

2009/1/21 Ingo Molnar <mingo@...e.hu>:
>
> * Mandeep Singh Baines <msb@...gle.com> wrote:
>
>>       read_lock(&tasklist_lock);
>>       do_each_thread(g, t) {
>> -             if (!--max_count)
>> -                     goto unlock;
>> +             if (!--max_count) {
>> +                     /*
>> +                      * Drop the lock every once in a while and resched if
>> +                      * necessary. Don't want to hold the lock too long.
>> +                      */
>> +                     get_task_struct(t);
>> +                     read_unlock(&tasklist_lock);
>> +                     max_count = HUNG_TASK_CHECK_COUNT;
>> +                     if (need_resched())
>> +                             schedule();
>> +                     read_lock(&tasklist_lock);
>> +                     put_task_struct(t);
>> +                     /*
>> +                      * t was unlinked from tasklist. Can't continue in this
>> +                      * case. Exit and try again next time.
>> +                      */
>> +                     if (t->state == TASK_DEAD)
>> +                             goto unlock;
>> +             }
>
> firstly, this bit should move into a helper function. Also, why dont you
> do the need_resched() check first (it's very lighweight) - and thus only
> do the heavy ops (get-task-struct & tasklist_lock unlock) if that is set?
>
> But most importantly, isnt the logic above confused? --max_count will
> reach zero exactly once, and then we'll loop for a long time.
>
>        Ingo
>

Thanks Mandeep for this patch.

By reading Ingo's review, I notice that the checks done by
check_hung_task() are quite lightweight and then quick.
I guess your loop is able to go through a good number of tasks before
actually needing to be resched.
Perhaps the max_count can be dropped actually, since it is a kind of
arbitrary chosen constant.
So you would just need to check need_resched() at each iteration.

ie (should be split in helper functions):

bool retry = false;

do {
	read_lock(&tasklist_lock);
	do_each_thread(g, t) {
		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
		if (t->state == TASK_UNINTERRUPTIBLE)
			check_hung_task(t, now, timeout);
                if (need_resched()) {
                    get_task_struct(t);
                    read_unlock(&tasklist_lock);
                    schedule();
                    read_lock(&tasklist_lock);
                    put_task_struct(t);

                    if (t->state == TASK_DEAD) {
                        retry = true;
                        goto unlock;
                    }
                }

	} while_each_thread(g, t);
 unlock:
	read_unlock(&tasklist_lock);
} while(retry);


There are good chances that your "t" task hasn't died, but if so, you
can just retry the whole thing.
Perhaps that should require some tests to see if there is a good
average of t->stat == TASK_DEAD path not taken,
but of course, it depends on what the system is doing most.

What do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/