[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4350ee16-15d0-1d37-6074-cea5185aac51@redhat.com>
Date: Wed, 2 Jun 2021 09:44:36 +0200
From: David Hildenbrand <david@...hat.com>
To: yong w <yongw.pur@...il.com>
Cc: minchan@...nel.org, ngupta@...are.org, senozhatsky@...omium.org,
axboe@...nel.dk, akpm@...ux-foundation.org,
songmuchun@...edance.com, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH V1] zram:calculate available memory when zram is used
On 02.06.21 02:06, yong w wrote:
> Thanks for your reply!
>
> Indeed, when zram is not triggered, MemAvalible will be greater than Memtotal.
> As you said, this modification will have program compatibility issues,
> when there
> are some user space scripts use MemAvailable for calculation.
>
> How about providing another proc interface?
> Such as /proc/xxx, When execute “cat /proc/xxx”, it returns "available
> + swapfree - swapfree * compress ratio" value.
Good question. What would be an appropriate name and description for
this entry?
We do have in /proc/meminfo already:
MemTotal:
MemAvailable:
SwapTotal:
SwapFree:
I wonder if we could add an entry either for "swapfree - swapfree *
compress" or "swapfree * compress". But even then, I struggle to find a
fitting name. We could either define something like
"SwapAvailable:" An estimate of how much swap space is available for
starting new applications. Note that SwapAvailable is different to
SwapFree when swap compression, as implemented by zram, is being performed.
So a user can compute
"TotalAvailable = MemAvailable + SwapAvailable".
or if that doesn't make any sense:
"TotalAvailable:" An estimate of how much memory and swap space is
available for starting new applications. Usually, the value corresponds
to "MemAvailable + SwapFree", however, swap compression, as implemented
by zram, might allow for making more efficient use of free swap space.
I'd prefer the first -- if it makes any sense.
You should CC linux-api on follow up patches.
> Thanks in advance!
>
>
> David Hildenbrand <david@...hat.com> 于2021年5月31日周一 下午4:17写道:
>>
>> On 30.05.21 19:18, yong w wrote:
>>> When zram is used, available+Swap free memory is obviously bigger than
>>> I actually can use, because zram can compress memory by compression
>>> algorithm and zram compressed data will occupy memory too.
>>>
>>> So, I count the compression rate of zram in the kernel. The available
>>> memory is calculated as follows:
>>> available + swapfree - swapfree * compress ratio
>>> MemAvailable in /proc/meminfo returns available + zram will save space
>>>
>>
>> This will mean that we can easily have MemAvailable > MemTotal, right?
>> I'm not sure if there are some user space scripts that will be a little
>> disrupted by that. Like calculating "MemUnavailable = MemTotal -
>> MemAvailable".
>>
>> MemAvailable: "An estimate of how much memory is available for starting
>> new applications, without swapping"
>>
>> Although zram isn't "traditional swapping", there is still a performance
>> impact when having to go to zram because it adds an indirection and
>> requires (de)compression. Similar to having very fast swap space (like
>> PMEM). Let's not call something "memory" that doesn't have the same
>> semantics as real memory as in "MemTotal".
>>
>> This doesn't feel right.
>>
>>> Signed-off-by: wangyong <yongw.pur@...il.com <mailto:yongw.pur@...il.com>>
>>> ---
>>> drivers/block/zram/zcomp.h | 1 +
>>> drivers/block/zram/zram_drv.c | 4 ++
>>> drivers/block/zram/zram_drv.h | 1 +
>>> fs/proc/meminfo.c | 2 +-
>>> include/linux/swap.h | 10 +++++
>>> mm/swapfile.c | 95
>>> +++++++++++++++++++++++++++++++++++++++++++
>>> mm/vmscan.c | 1 +
>>> 7 files changed, 113 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
>>> index 40f6420..deb2dbf 100644
>>> --- a/drivers/block/zram/zcomp.h
>>> +++ b/drivers/block/zram/zcomp.h
>>> @@ -40,4 +40,5 @@ int zcomp_decompress(struct zcomp_strm *zstrm,
>>> const void *src, unsigned int src_len, void *dst);
>>>
>>> bool zcomp_set_max_streams(struct zcomp *comp, int num_strm);
>>> +int get_zram_major(void);
>>> #endif /* _ZCOMP_H_ */
>>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>> index cf8deec..1c6cbd4 100644
>>> --- a/drivers/block/zram/zram_drv.c
>>> +++ b/drivers/block/zram/zram_drv.c
>>> @@ -59,6 +59,10 @@ static void zram_free_page(struct zram *zram, size_t
>>> index);
>>> static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
>>> u32 index, int offset, struct bio *bio);
>>>
>>> +int get_zram_major(void)
>>> +{
>>> + return zram_major;
>>> +}
>>>
>>> static int zram_slot_trylock(struct zram *zram, u32 index)
>>> {
>>> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>> index 6e73dc3..5d8701a 100644
>>> --- a/drivers/block/zram/zram_drv.h
>>> +++ b/drivers/block/zram/zram_drv.h
>>> @@ -88,6 +88,7 @@ struct zram_stats {
>>> atomic64_t bd_reads; /* no. of reads from backing device */
>>> atomic64_t bd_writes; /* no. of writes from backing device */
>>> #endif
>>> + atomic_t min_compr_ratio;
>>> };
>>>
>>> struct zram {
>>> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
>>> index 6fa761c..f7bf350 100644
>>> --- a/fs/proc/meminfo.c
>>> +++ b/fs/proc/meminfo.c
>>> @@ -57,7 +57,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>>>
>>> show_val_kb(m, "MemTotal: ", i.totalram);
>>> show_val_kb(m, "MemFree: ", i.freeram);
>>> - show_val_kb(m, "MemAvailable: ", available);
>>> + show_val_kb(m, "MemAvailable: ", available + count_avail_swaps());
>>> show_val_kb(m, "Buffers: ", i.bufferram);
>>> show_val_kb(m, "Cached: ", cached);
>>> show_val_kb(m, "SwapCached: ", total_swapcache_pages());
>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>> index 032485e..3225a2f 100644
>>> --- a/include/linux/swap.h
>>> +++ b/include/linux/swap.h
>>> @@ -514,6 +514,8 @@ extern int init_swap_address_space(unsigned int
>>> type, unsigned long nr_pages);
>>> extern void exit_swap_address_space(unsigned int type);
>>> extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
>>> sector_t swap_page_sector(struct page *page);
>>> +extern void update_zram_zstats(void);
>>> +extern u64 count_avail_swaps(void);
>>>
>>> static inline void put_swap_device(struct swap_info_struct *si)
>>> {
>>> @@ -684,6 +686,14 @@ static inline swp_entry_t get_swap_page(struct page
>>> *page)
>>> return entry;
>>> }
>>>
>>> +void update_zram_zstats(void)
>>> +{
>>> +}
>>> +
>>> +u64 count_avail_swaps(void)
>>> +{
>>> +}
>>> +
>>> #endif /* CONFIG_SWAP */
>>>
>>> #ifdef CONFIG_THP_SWAP
>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>> index cbb4c07..93a9dcb 100644
>>> --- a/mm/swapfile.c
>>> +++ b/mm/swapfile.c
>>> @@ -44,6 +44,7 @@
>>> #include <asm/tlbflush.h>
>>> #include <linux/swapops.h>
>>> #include <linux/swap_cgroup.h>
>>> +#include "../drivers/block/zram/zram_drv.h"
>>>
>>> static bool swap_count_continued(struct swap_info_struct *, pgoff_t,
>>> unsigned char);
>>> @@ -3408,6 +3409,100 @@ SYSCALL_DEFINE2(swapon, const char __user *,
>>> specialfile, int, swap_flags)
>>> return error;
>>> }
>>>
>>> +u64 count_avail_swap(struct swap_info_struct *si)
>>> +{
>>> + u64 result;
>>> + struct zram *z;
>>> + unsigned int free;
>>> + unsigned int ratio;
>>> +
>>> + result = 0;
>>> + if (!si)
>>> + return 0;
>>> +
>>> + //zram calculate available mem
>>> + if (si->flags & SWP_USED && si->swap_map) {
>>> + if (si->bdev->bd_disk->major == get_zram_major()) {
>>> + z = (struct zram *)si->bdev->bd_disk->private_data;
>>> + down_read(&z->init_lock);
>>> + ratio = atomic_read(&z->stats.min_compr_ratio);
>>> + free = (si->pages << (PAGE_SHIFT - 10))
>>> + - (si->inuse_pages << (PAGE_SHIFT - 10));
>>> + if (!ratio)
>>> + result += free / 2;
>>> + else
>>> + result = free * (100 - 10000 / ratio) / 100;
>>> + up_read(&z->init_lock);
>>> + }
>>> + } else
>>> + result += (si->pages << (PAGE_SHIFT - 10))
>>> + - (si->inuse_pages << (PAGE_SHIFT - 10));
>>> +
>>> + return result;
>>> +}
>>> +
>>> +u64 count_avail_swaps(void)
>>> +{
>>> + int type;
>>> + u64 result;
>>> + struct swap_info_struct *si;
>>> +
>>> + result = 0;
>>> + spin_lock(&swap_lock);
>>> + for (type = 0; type < nr_swapfiles; type++) {
>>> + si = swap_info[type];
>>> + spin_lock(&si->lock);
>>> + result += count_avail_swap(si);
>>> + spin_unlock(&si->lock);
>>> + }
>>> + spin_unlock(&swap_lock);
>>> +
>>> + return result;
>>> +}
>>> +
>>> +void update_zram_zstat(struct swap_info_struct *si)
>>> +{
>>> + struct zram *z;
>>> + struct zram_stats *stat;
>>> + int ratio;
>>> + u64 orig_size, compr_data_size;
>>> +
>>> + if (!si)
>>> + return;
>>> +
>>> + //update zram min compress ratio
>>> + if (si->flags & SWP_USED && si->swap_map) {
>>> + if (si->bdev->bd_disk->major == get_zram_major()) {
>>> + z = (struct zram *)si->bdev->bd_disk->private_data;
>>> + down_read(&z->init_lock);
>>> + stat = &z->stats;
>>> + ratio = atomic_read(&stat->min_compr_ratio);
>>> + orig_size = atomic64_read(&stat->pages_stored) << PAGE_SHIFT;
>>> + compr_data_size = atomic64_read(&stat->compr_data_size);
>>> + if (compr_data_size && (!ratio
>>> + || ((orig_size * 100) / compr_data_size < ratio)))
>>> + atomic_set(&stat->min_compr_ratio,
>>> + (orig_size * 100) / compr_data_size);
>>> + up_read(&z->init_lock);
>>> + }
>>> + }
>>> +}
>>> +
>>> +void update_zram_zstats(void)
>>> +{
>>> + int type;
>>> + struct swap_info_struct *si;
>>> +
>>> + spin_lock(&swap_lock);
>>> + for (type = 0; type < nr_swapfiles; type++) {
>>> + si = swap_info[type];
>>> + spin_lock(&si->lock);
>>> + update_zram_zstat(si);
>>> + spin_unlock(&si->lock);
>>> + }
>>> + spin_unlock(&swap_lock);
>>> +}
>>> +
>>> void si_swapinfo(struct sysinfo *val)
>>> {
>>> unsigned int type;
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index eb31452..ffaf59b 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -4159,6 +4159,7 @@ static int kswapd(void *p)
>>> alloc_order);
>>> reclaim_order = balance_pgdat(pgdat, alloc_order,
>>> highest_zoneidx);
>>> + update_zram_zstats();
>>> if (reclaim_order < alloc_order)
>>> goto kswapd_try_sleep;
>>> }
>>> --
>>> 2.7.4
>>
>>
>> --
>> Thanks,
>>
>> David / dhildenb
>>
>
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists