linux-kernel - Re: do_generic_mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.GSO.4.64.0703121247210.7679@cpu102.cs.uwaterloo.ca>
Date:	Mon, 12 Mar 2007 13:05:17 -0400 (EDT)
From:	Ashif Harji <asharji@...uwaterloo.ca>
To:	Jan Kara <jack@...e.cz>
cc:	Nick Piggin <npiggin@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: do_generic_mapping_read performance issue


On Mon, 12 Mar 2007, Jan Kara wrote:

> On Mon 12-03-07 15:39:00, Nick Piggin wrote:
>> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>>>   Hi,
>>>
>>>> Hi, I am encountering a performance problem, which I have tracked into the
>>>> Linux kernel. The problem occurs with my experimental web server that uses
>>>> sendfile to repeatedly transmit files.  The files are based on the static
>>>> portion of the SPECweb99 fileset and range in size to model a reasonable
>>>> workload.  With this workload, a significant number of the requests are
>>>> for files of size 4 KB or less.
>>>>
>>>> I have determined that the performance problems occurs in the function
>>>> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
>>>> Here is the specific code fragment:
>>>>
>>>>         /*
>>>>          * When (part of) the same page is read multiple times
>>>>          * in succession, only mark it as accessed the first time.
>>>>          */
>>>>         if (prev_index != index)
>>>>                 mark_page_accessed(page);
>>>   Actually, the code is like that certainly for two years :).
>>
>> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
>> stage (ie. read from the start of a page -- obviously that also has holes).
>  Yes, at least in 2.6.12-rc5 which is the first one in git :).
>
>>>> I was wondering if anyone could explain why the call to mark_page_accessed
>>>> is conditional? That is, what problem it is trying to solve. It would seem
>>>> that in many scenarios, if the same page is accessed repeatedly, then it
>>>> would be appropriate to keep that page cached.
>>>   I also don't know why the condition is there but it's there at least
>>> for two years so I'm not sure anybody remembers ;). Nick, do you have
>>> an idea?
>>
>> Yeah it is there because that is basically how our "use once" detection
>> handles the case where an app does not read in chunks that are PAGE_SIZE
>> multiples and PAGE_SIZE aligned.
>  OK, I see. Then I'm not sure the check does more good than bad. Because
> if we happen to reread the same chunk several times, then the check does a
> wrong thing...

Thanks for providing me with additional information.

I would like to submit a patch to fix the performance problem.  The 
simplest solution is to remove the check.  Even in the situation where an 
application does not read in PAGE_SIZE multiples as described above, if 
the page is accessed frequently it should remain in the cache.  However, I 
am open to suggestions for a more sophisticated scheme.

ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/