[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87msqjkk21.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Thu, 28 Mar 2024 13:21:58 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Bharata B Rao <bharata@....com>
Cc: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<akpm@...ux-foundation.org>, <mingo@...hat.com>,
<peterz@...radead.org>, <mgorman@...hsingularity.net>,
<raghavendra.kt@....com>, <dave.hansen@...ux.intel.com>,
<hannes@...xchg.org>
Subject: Re: [RFC PATCH 1/2] sched/numa: Fault count based NUMA hint fault
latency
Bharata B Rao <bharata@....com> writes:
> On 28-Mar-24 7:26 AM, Huang, Ying wrote:
>> Bharata B Rao <bharata@....com> writes:
>>
>> [snip]
>>
>>> @@ -1750,25 +1753,20 @@ static bool pgdat_free_space_enough(struct pglist_data *pgdat)
>>> }
>>>
>>> /*
>>> - * For memory tiering mode, when page tables are scanned, the scan
>>> - * time will be recorded in struct page in addition to make page
>>> - * PROT_NONE for slow memory page. So when the page is accessed, in
>>> - * hint page fault handler, the hint page fault latency is calculated
>>> - * via,
>>> + * For memory tiering mode, when page tables are scanned, the current
>>> + * hint fault count will be recorded in struct page in addition to
>>> + * make page PROT_NONE for slow memory page. So when the page is
>>> + * accessed, in hint page fault handler, the hint page fault latency is
>>> + * calculated via,
>>> *
>>> - * hint page fault latency = hint page fault time - scan time
>>> + * hint page fault latency = current hint fault count - fault count at scan time
>>> *
>>> * The smaller the hint page fault latency, the higher the possibility
>>> * for the page to be hot.
>>> */
>>> -static int numa_hint_fault_latency(struct folio *folio)
>>> +static inline int numa_hint_fault_latency(struct folio *folio, int count)
>>> {
>>> - int last_time, time;
>>> -
>>> - time = jiffies_to_msecs(jiffies);
>>> - last_time = folio_xchg_access_time(folio, time);
>>> -
>>> - return (time - last_time) & PAGE_ACCESS_TIME_MASK;
>>> + return count - folio_xchg_fault_count(folio, count);
>>> }
>>
>> I found count is task->mm->hint_faults. That is a process wide
>> counting. How do you connect the hotness of a folio with the count of
>> hint page fault in the process? How do you compare the hotness of
>> folios among different processes?
>
> The global hint fault count that we already maintain could
> be used instead of per-task fault. That should take care
> of the concern you mention right?
I have plotted the total number of hint faults per second before, and it
changes a lot along the time. So I don't think it is a good
measurement.
--
Best Regards,
Huang, Ying
Powered by blists - more mailing lists