[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0812031957570.19174@blonde.anvils>
Date: Wed, 3 Dec 2008 20:24:30 +0000 (GMT)
From: Hugh Dickins <hugh@...itas.com>
To: Oleg Nesterov <oleg@...hat.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Jay Lan <jlan@....com>, Jiri Pirko <jpirko@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx
accounting
On Wed, 3 Dec 2008, Oleg Nesterov wrote:
> Unless we are going to decrease rss/vm there is no point to call the
> (racy) update_hiwater_xxx() helpers. Still do_exit() does this, and
I'm puzzled by this comment. exit() _is_ about to decrease rss/vm,
so isn't it right to be calling update_hiwater_xxx()?
There is a question of who's going to be able to see the result from
this point on: I forget whether I was doing it for my own satisfaction,
or for a real observer. Even if there isn't a real observer today,
I think I'd prefer do_exit() to continue to update_hiwater_xxx(),
in case an observer is added tomorrow - unless you feel it's
unjustifiably adding code to and slowing down process exit.
You say "(racy)": in my view, it was only as racy as whatever might
cause it to be racy. By that, I mean that if the numbers ended up
slightly wrong, you could reasonably imagine that the races happened
in a different sequence which would have ended up with the numbers
seen. Have you noticed something more serious we need to fix?
> the accounting code uses mm->hiwater_xxx directly.
>
> This is not right. fill_pid()->xacct_add_tsk() can be called by
> taskstats_user_cmd() at any time, not only when the task exits.
> in that case taskstats->hiwater_xxx can be very wrong.
Here you're very right. There was no tsacct.c when I added those
hiwaters in 2.6.15, it's quite wrong to have been using those
numbers without comparing against current values, well spotted.
>
> Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() to use instead,
> and kill the "if (tsk->mm) {}" code in do_exit().
If you're going to add special helper macros (I don't care myself),
wouldn't it be better to convert fs/proc/task_mmu.c (the original
consumer) to use them too?
And, as I say, I'd _prefer_ that block to remain in do_exit(),
but don't have strong evidence why it should.
> The first helper will
> be also used to actually fill/report rusage->ru_maxrss.
Oh, yes, I noticed a mail yesterday in which you claimed to Cc me,
but didn't (like we all claim to be attaching missing patches ;)
I then forgot it, but yes, I am glad to see Jiri putting
hiwater_rss to more use, fewer ever-0s from /usr/bin/time.
Hugh
>
> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
>
> --- K-28/include/linux/sched.h~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/include/linux/sched.h 2008-12-03 18:17:18.000000000 +0100
> @@ -388,6 +388,9 @@ extern void arch_unmap_area_topdown(stru
> (mm)->hiwater_vm = (mm)->total_vm; \
> } while (0)
>
> +#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))
> +#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)
> +
> extern void set_dumpable(struct mm_struct *mm, int value);
> extern int get_dumpable(struct mm_struct *mm);
>
> --- K-28/kernel/tsacct.c~HIWATER 2008-10-10 00:13:53.000000000 +0200
> +++ K-28/kernel/tsacct.c 2008-12-03 18:24:28.000000000 +0100
> @@ -90,8 +90,8 @@ void xacct_add_tsk(struct taskstats *sta
> mm = get_task_mm(p);
> if (mm) {
> /* adjust to KB unit */
> - stats->hiwater_rss = mm->hiwater_rss * PAGE_SIZE / KB;
> - stats->hiwater_vm = mm->hiwater_vm * PAGE_SIZE / KB;
> + stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
> + stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
> mmput(mm);
> }
> stats->read_char = p->ioac.rchar;
> --- K-28/kernel/exit.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/kernel/exit.c 2008-12-03 18:21:06.000000000 +0100
> @@ -1048,10 +1048,7 @@ NORET_TYPE void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - if (tsk->mm) {
> - update_hiwater_rss(tsk->mm);
> - update_hiwater_vm(tsk->mm);
> - }
> +
> group_dead = atomic_dec_and_test(&tsk->signal->live);
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists