[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081204135250.GA5448@redhat.com>
Date: Thu, 4 Dec 2008 14:52:50 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Hugh Dickins <hugh@...itas.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Jay Lan <jlan@....com>, Jiri Pirko <jpirko@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] introduce get_mm_hiwater_xxx(), fix
taskstats->hiwater_xxx accounting
On 12/03, Hugh Dickins wrote:
>
> On Wed, 3 Dec 2008, Oleg Nesterov wrote:
>
> > Unless we are going to decrease rss/vm there is no point to call the
> > (racy) update_hiwater_xxx() helpers. Still do_exit() does this, and
>
> I'm puzzled by this comment. exit() _is_ about to decrease rss/vm,
> so isn't it right to be calling update_hiwater_xxx()?
Do you mean exit_mm()->...->exit_mmap() ? But this doesn't matter, this
->mm is going away. Nobody can read these counters when ->mm_users == 0,
no?
> There is a question of who's going to be able to see the result from
> this point on: I forget whether I was doing it for my own satisfaction,
> or for a real observer. Even if there isn't a real observer today,
> I think I'd prefer do_exit() to continue to update_hiwater_xxx(),
> in case an observer is added tomorrow - unless you feel it's
> unjustifiably adding code to and slowing down process exit.
Please see below,
> You say "(racy)": in my view, it was only as racy as whatever might
> cause it to be racy. By that, I mean that if the numbers ended up
> slightly wrong, you could reasonably imagine that the races happened
> in a different sequence which would have ended up with the numbers
> seen. Have you noticed something more serious we need to fix?
But the difference can be huge. Let's suppose the process has 2 threads
T1 and T2.
T1 exits, calls update_hiwater_vm(), notices that mm->hiwater_vm must be
updated, and preempted right before mm->hiwater_vm = new_value.
T2 does free(malloc(A_LOT)) and exits. It sets the correct value for
->hiwater_vm which takes A_LOT into account and disappears.
T1 resumes, and "reverts" ->hiwater_vm to the "new_value" which was
calculated before.
Now, since T1 is the last exiting thread, the whole thread group
exits and we report the wrong ->hiwater_vm to the userspace.
Yes, this race is unlikely. But there is another reason (perhaps
not very good) why I tried to remove update_hiwater_xxx() from
do_exit().
Imho this code looks as if: from now it is "safe" to use ->hiwater_xxx
directly because we already updated it. But it is not. Unless we are
the last thread, we can't trust mm->hiwater_xxx anyway, we should
re-check get_mm_rss/total_vm.
> > Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() to use instead,
> > and kill the "if (tsk->mm) {}" code in do_exit().
>
> If you're going to add special helper macros (I don't care myself),
> wouldn't it be better to convert fs/proc/task_mmu.c (the original
> consumer) to use them too?
Yes, I was going to convert task_mem(), but noticed that it has
to read get_mm_rss() and ->total_vm anyway. Still, perhaps it
would be more clean to use the new macros anyway, even if this
wiil (unlikely) need a couple of extra cpu ticks.
> And, as I say, I'd _prefer_ that block to remain in do_exit(),
> but don't have strong evidence why it should.
I think that update_hiwater_xxx() should be "private" for vm code which
does zap/unmap, and any observer should use get_mm_hiwater_xxx(). Nobody
should use mm->hiwater_xxx directly.
But I don't (and of course can't) have a strong opinion on that,
will wait for your verdict and re-send.
The patch need the update anyway, I just noticed that if we remove
update_hiwater_xxx() from do_exit(), we should change the comment
in exit_mmap().
> > The first helper will
> > be also used to actually fill/report rusage->ru_maxrss.
>
> Oh, yes, I noticed a mail yesterday in which you claimed to Cc me,
> but didn't (like we all claim to be attaching missing patches ;)
Yes sorry ;) The only problem with mutt is that it is not possible
to change CC while editing the text (unless edit_headers is set).
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists