Sudden grow of memory in "Laundry" state

Post by Robert
Hi, I have a server with FreeBSD 11.2 and 48 Gigs of RAM where an app
with extensive usage of shared memory (24GB allocation) is running.
After some random amount of time (usually few days of running), there
happens a sudden increase of "Laundry" memory grow (from zero to 24G in
a few minutes).
Then slowly it reduces.
Are described symptoms normal and expected? I've never noticed anything
like that on 11.1.

The laundry queue contains dirty inactive pages, which need to be
written back to the swap device or a filesystem before they can be
reused. When the system is short for free pages, it will scan the
inactive queue looking for clean pages, which can be freed cheaply.
Dirty pages are moved to the laundry queue. My guess is that the
system was running without a page shortage for a long time, and
suddenly experienced some memory pressure. This caused lots of
pages to move from the inactive queue to the laundry queue. Demand
for free pages will then cause pages in the laundry queue to be
written back and freed, or requeued if the page was referenced after
being placed in the laundry queue. "vmstat -s" and "sysctl vm.stats"
output might make things more clear.

All this is to say that there's nothing particularly abnormal about what
you're observing, though it's not clear what effects this behaviour has
on your workload, if any. I can't think of any direct reason this would
happen on 11.2 but not 11.1.

Robert

2018-09-11 05:18:31 UTC

Hi, if I understood correctly, "written back to swap device" means they
come from swap at some point, but it's not the case (see attached graph).

Swap was 16GB, and slightly reduced when pages rapidly started to move
from free (or "Inactive"?) into "Laundry" queue.

Here is vmstat output:

vmstat -s
821885826 cpu context switches
3668809349 device interrupts
470487370 software interrupts
589970984 traps
3010410552 system calls
       25 kernel threads created
   378438 fork() calls
    21904 vfork() calls
        0 rfork() calls
     1762 swap pager pageins
     6367 swap pager pages paged in
    61678 swap pager pageouts
1682860 swap pager pages paged out
     1782 vnode pager pageins
    16016 vnode pager pages paged in
        0 vnode pager pageouts
        0 vnode pager pages paged out
        3 page daemon wakeups
1535368624 pages examined by the page daemon
       12 clean page reclamation shortfalls
2621520 pages reactivated by the page daemon
18865126 copy-on-write faults
     5910 copy-on-write optimized faults
36063024 zero fill pages zeroed
    21137 zero fill pages prezeroed
    12149 intransit blocking page faults
704496861 total VM faults taken
     3164 page faults requiring I/O
        0 pages affected by kernel thread creation
15318548 pages affected by fork()
   738228 pages affected by vfork()
        0 pages affected by rfork()
61175662 pages freed
1936826 pages freed by daemon
24420300 pages freed by exiting processes
3164850 pages active
2203028 pages inactive
6196772 pages in the laundry queue
   555637 pages wired down
   102762 pages free
     4096 bytes per page
2493686705 total name lookups
          cache hits (99% pos + 0% neg) system 0% per-directory
          deletions 0%, falsehits 0%, toolong 0%

What do you think? How pages could be moved into "Laundry" without being
in Swap?

Thanks.

The laundry queue contains dirty inactive pages, which need to be
written back to the swap device or a filesystem before they can be
reused. When the system is short for free pages, it will scan the
inactive queue looking for clean pages, which can be freed cheaply.
Dirty pages are moved to the laundry queue. My guess is that the
system was running without a page shortage for a long time, and
suddenly experienced some memory pressure. This caused lots of
pages to move from the inactive queue to the laundry queue. Demand
for free pages will then cause pages in the laundry queue to be
written back and freed, or requeued if the page was referenced after
being placed in the laundry queue. "vmstat -s" and "sysctl vm.stats"
output might make things more clear.
All this is to say that there's nothing particularly abnormal about what
you're observing, though it's not clear what effects this behaviour has
on your workload, if any. I can't think of any direct reason this would
happen on 11.2 but not 11.1.

Robert

2018-09-11 05:23:01 UTC

sysctl vm.stats
vm.stats.object.bypasses: 44686
vm.stats.object.collapses: 1635786
vm.stats.misc.cnt_prezero: 0
vm.stats.misc.zero_page_count: 29511
vm.stats.vm.v_kthreadpages: 0
vm.stats.vm.v_rforkpages: 0
vm.stats.vm.v_vforkpages: 738592
vm.stats.vm.v_forkpages: 15331959
vm.stats.vm.v_kthreads: 25
vm.stats.vm.v_rforks: 0
vm.stats.vm.v_vforks: 21915
vm.stats.vm.v_forks: 378768
vm.stats.vm.v_interrupt_free_min: 2
vm.stats.vm.v_pageout_free_min: 34
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_laundry_count: 6196772
vm.stats.vm.v_inactive_count: 2205526
vm.stats.vm.v_inactive_target: 390661
vm.stats.vm.v_active_count: 3163069
vm.stats.vm.v_wire_count: 556447
vm.stats.vm.v_free_count: 101235
vm.stats.vm.v_free_min: 77096
vm.stats.vm.v_free_target: 260441
vm.stats.vm.v_free_reserved: 15981
vm.stats.vm.v_page_count: 12223372
vm.stats.vm.v_page_size: 4096
vm.stats.vm.v_tfree: 61213188
vm.stats.vm.v_pfree: 24438917
vm.stats.vm.v_dfree: 1936826
vm.stats.vm.v_tcached: 0
vm.stats.vm.v_pdshortfalls: 12
vm.stats.vm.v_pdpages: 1536983413
vm.stats.vm.v_pdwakeups: 3
vm.stats.vm.v_reactivated: 2621520
vm.stats.vm.v_intrans: 12150
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 16016
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 1782
vm.stats.vm.v_swappgsout: 1682860
vm.stats.vm.v_swappgsin: 6368
vm.stats.vm.v_swapout: 61678
vm.stats.vm.v_swapin: 1763
vm.stats.vm.v_ozfod: 21498
vm.stats.vm.v_zfod: 36072114
vm.stats.vm.v_cow_optim: 5912
vm.stats.vm.v_cow_faults: 18880051
vm.stats.vm.v_io_faults: 3165
vm.stats.vm.v_vm_faults: 705101188
vm.stats.sys.v_soft: 470906002
vm.stats.sys.v_intr: 3743337461
vm.stats.sys.v_syscall: 3134154383
vm.stats.sys.v_trap: 590473243
vm.stats.sys.v_swtch: 1037209739

Post by Robert
Hi, if I understood correctly, "written back to swap device" means
they come from swap at some point, but it's not the case (see attached
graph).
Swap was 16GB, and slightly reduced when pages rapidly started to move
from free (or "Inactive"?) into "Laundry" queue.
vmstat -s
821885826 cpu context switches
3668809349 device interrupts
470487370 software interrupts
589970984 traps
3010410552 system calls
       25 kernel threads created
   378438 fork() calls
    21904 vfork() calls
        0 rfork() calls
     1762 swap pager pageins
     6367 swap pager pages paged in
    61678 swap pager pageouts
1682860 swap pager pages paged out
     1782 vnode pager pageins
    16016 vnode pager pages paged in
        0 vnode pager pageouts
        0 vnode pager pages paged out
        3 page daemon wakeups
1535368624 pages examined by the page daemon
       12 clean page reclamation shortfalls
2621520 pages reactivated by the page daemon
18865126 copy-on-write faults
     5910 copy-on-write optimized faults
36063024 zero fill pages zeroed
    21137 zero fill pages prezeroed
    12149 intransit blocking page faults
704496861 total VM faults taken
     3164 page faults requiring I/O
        0 pages affected by kernel thread creation
15318548 pages affected by fork()
   738228 pages affected by vfork()
        0 pages affected by rfork()
61175662 pages freed
1936826 pages freed by daemon
24420300 pages freed by exiting processes
3164850 pages active
2203028 pages inactive
6196772 pages in the laundry queue
   555637 pages wired down
   102762 pages free
     4096 bytes per page
2493686705 total name lookups
          cache hits (99% pos + 0% neg) system 0% per-directory
          deletions 0%, falsehits 0%, toolong 0%
What do you think? How pages could be moved into "Laundry" without
being in Swap?
Thanks.

The laundry queue contains dirty inactive pages, which need to be
written back to the swap device or a filesystem before they can be
reused. When the system is short for free pages, it will scan the
inactive queue looking for clean pages, which can be freed cheaply.
Dirty pages are moved to the laundry queue. My guess is that the
system was running without a page shortage for a long time, and
suddenly experienced some memory pressure. This caused lots of
pages to move from the inactive queue to the laundry queue. Demand
for free pages will then cause pages in the laundry queue to be
written back and freed, or requeued if the page was referenced after
being placed in the laundry queue. "vmstat -s" and "sysctl vm.stats"
output might make things more clear.
All this is to say that there's nothing particularly abnormal about what
you're observing, though it's not clear what effects this behaviour has
on your workload, if any. I can't think of any direct reason this would
happen on 11.2 but not 11.1.

Mark Johnston

2018-09-11 15:08:49 UTC

Post by Robert
Hi, if I understood correctly, "written back to swap device" means they
come from swap at some point, but it's not the case (see attached graph).

Sorry, I didn't mean to imply that. Pages containing your application's
shared memory, for instance, would simply be written to the swap device
before being freed and reused for some other purpose.

Your graph shows a sudden drop in free memory. Does that coincide with
the sudden increase in size of the laundry queue?

Post by Robert
Swap was 16GB, and slightly reduced when pages rapidly started to move
from free (or "Inactive"?) into "Laundry" queue.

Right. Specifically, the amount of free swap space decreased right at
the time that the amount of free memory dropped, so what likely happened
is that the system wrote some pages in "Laundry" to the swap device so
that they could be freed, as a response to the drop in free memory.

Post by Robert
vmstat -s
[...]
12 clean page reclamation shortfalls

This line basically means that at some point we were writing pages to
the swap device as fast as possible in order to reclaim some memory.

Post by Robert
What do you think? How pages could be moved into "Laundry" without being
in Swap?

That's perfectly normal. Pages typically move from "Active" or
"Inactive" to laundry.

Post by Robert

The laundry queue contains dirty inactive pages, which need to be
written back to the swap device or a filesystem before they can be
reused. When the system is short for free pages, it will scan the
inactive queue looking for clean pages, which can be freed cheaply.
Dirty pages are moved to the laundry queue. My guess is that the
system was running without a page shortage for a long time, and
suddenly experienced some memory pressure. This caused lots of
pages to move from the inactive queue to the laundry queue. Demand
for free pages will then cause pages in the laundry queue to be
written back and freed, or requeued if the page was referenced after
being placed in the laundry queue. "vmstat -s" and "sysctl vm.stats"
output might make things more clear.
All this is to say that there's nothing particularly abnormal about what
you're observing, though it's not clear what effects this behaviour has
on your workload, if any. I can't think of any direct reason this would
happen on 11.2 but not 11.1.

Robert

2018-09-27 23:04:15 UTC

Is there a way to force move pages back from laundry to Free or Inactive?

Also, what's the best way to identify addresses of these pages and
"look" inside of them?

Thanks.

Post by Robert
Hi, if I understood correctly, "written back to swap device" means they
come from swap at some point, but it's not the case (see attached graph).

Sorry, I didn't mean to imply that. Pages containing your application's
shared memory, for instance, would simply be written to the swap device
before being freed and reused for some other purpose.
Your graph shows a sudden drop in free memory. Does that coincide with
the sudden increase in size of the laundry queue?

Post by Robert
Swap was 16GB, and slightly reduced when pages rapidly started to move
from free (or "Inactive"?) into "Laundry" queue.

Post by Robert
vmstat -s
[...]
12 clean page reclamation shortfalls

This line basically means that at some point we were writing pages to
the swap device as fast as possible in order to reclaim some memory.

Post by Robert
What do you think? How pages could be moved into "Laundry" without being
in Swap?

That's perfectly normal. Pages typically move from "Active" or
"Inactive" to laundry.

Post by Robert

The laundry queue contains dirty inactive pages, which need to be
written back to the swap device or a filesystem before they can be
reused. When the system is short for free pages, it will scan the
inactive queue looking for clean pages, which can be freed cheaply.
Dirty pages are moved to the laundry queue. My guess is that the
system was running without a page shortage for a long time, and
suddenly experienced some memory pressure. This caused lots of
pages to move from the inactive queue to the laundry queue. Demand
for free pages will then cause pages in the laundry queue to be
written back and freed, or requeued if the page was referenced after
being placed in the laundry queue. "vmstat -s" and "sysctl vm.stats"
output might make things more clear.
All this is to say that there's nothing particularly abnormal about what
you're observing, though it's not clear what effects this behaviour has
on your workload, if any. I can't think of any direct reason this would
happen on 11.2 but not 11.1.

Mark Johnston

2018-09-28 15:25:50 UTC

Post by Robert
Is there a way to force move pages back from laundry to Free or Inactive?

There's no real way to do that from outside of the application. The
application itself can free anonymous memory by unmapping it. It can
also force memory to be marked clean (thus moving it back to the
inactive queue) using madvise(MADV_FREE). This will cause the memory's
contents to be discarded, though.

Post by Robert
Also, what's the best way to identify addresses of these pages and
"look" inside of them?

There's no convenient way to do that that I'm aware of. On amd64, you
could in principle use kgdb to dump individual pages in the queue via
the direct map:

# kgdb82 -q
Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done.
done.
sched_switch (td=0xfffff8002c560000, newtd=0xfffff800025ca000, flags=<optimized out>) at /home/mark/src/freebsd-dev/sys/kern/sched_ule.c:2112
2112 cpuid = PCPU_GET(cpuid);
(kgdb) p vm_dom[0].vmd_pagequeues[2]
$1 = {
pq_mutex = {
lock_object = {
lo_name = 0xffffffff80c34c0d "vm laundry pagequeue",
lo_flags = 21168128,
lo_data = 0,
lo_witness = 0xfffff8041ed6cb00
},
mtx_lock = 0
},
pq_pl = {
tqh_first = 0xfffff8040f9ef980,
tqh_last = 0xfffff8041227f448
},
pq_cnt = 20087,
pq_name = 0xffffffff80c34c0d "vm laundry pagequeue",
pq_pdpages = 0
}
(kgdb) x/512gx vm_dom[0].vmd_pagequeues[2].pq_pl.tqh_first->phys_addr + (vm_offset_t)&dmapbase
0xfffff801aea00000: 0x0000000800000000 0x000000082298c400
0xfffff801aea00010: 0x00000009be801000 0x0000000000000006
...

Robert

2018-10-24 17:19:20 UTC

So the issue is still happening. Please check attached screenshot. The
green area is "inactive + cached + free".

At some point massive amount of "Free" memory is moving to Swap and
Laundry. Then slowly returns back to inactive.

My concern is that I've tried all kind of monitoring for mem
allocations, including dtrace-ing of all mmap calls.

It shows no any allocations of such a huge size, so I believe this is
something related to the kernel's mem management.

Could you advice which other places should be checked\monitored to find
a root cause?

Thanks.

Post by Robert
Is there a way to force move pages back from laundry to Free or Inactive?

Post by Robert
Also, what's the best way to identify addresses of these pages and
"look" inside of them?

There's no convenient way to do that that I'm aware of. On amd64, you
could in principle use kgdb to dump individual pages in the queue via
# kgdb82 -q
Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done.
done.
sched_switch (td=0xfffff8002c560000, newtd=0xfffff800025ca000, flags=<optimized out>) at /home/mark/src/freebsd-dev/sys/kern/sched_ule.c:2112
2112 cpuid = PCPU_GET(cpuid);
(kgdb) p vm_dom[0].vmd_pagequeues[2]
$1 = {
pq_mutex = {
lock_object = {
lo_name = 0xffffffff80c34c0d "vm laundry pagequeue",
lo_flags = 21168128,
lo_data = 0,
lo_witness = 0xfffff8041ed6cb00
},
mtx_lock = 0
},
pq_pl = {
tqh_first = 0xfffff8040f9ef980,
tqh_last = 0xfffff8041227f448
},
pq_cnt = 20087,
pq_name = 0xffffffff80c34c0d "vm laundry pagequeue",
pq_pdpages = 0
}
(kgdb) x/512gx vm_dom[0].vmd_pagequeues[2].pq_pl.tqh_first->phys_addr + (vm_offset_t)&dmapbase
0xfffff801aea00000: 0x0000000800000000 0x000000082298c400
0xfffff801aea00010: 0x00000009be801000 0x0000000000000006
...

Rozhuk Ivan

2018-10-24 18:12:37 UTC

On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
At some point massive amount of "Free" memory is moving to Swap and
Laundry. Then slowly returns back to inactive.
My concern is that I've tried all kind of monitoring for mem
allocations, including dtrace-ing of all mmap calls.
It shows no any allocations of such a huge size, so I believe this is
something related to the kernel's mem management.

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Looks very ugly from user side.

Mark Millard via freebsd-hackers

2018-10-24 18:34:17 UTC

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .

If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).

In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.

Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.

I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Robert

2018-10-24 20:25:48 UTC

Sorry, that wasn't my output, mine (related to the screenshot I've sent
earlier) is:

Mem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M Other
6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
Swap: 32G Total, 15G Used, 17G Free, 46% Inuse

I'm OK with a low "Free" memory if OS can effectively allocate from
"Inactive",

but I'm worrying about a sudden move of a huge piece of memory into
"Swap" without any relevant mmap calls.

My question is: what else (except mmap) may reduce "Free" memory and
increase "Laundry"\"Swap" in the system?

Thanks.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Robert

2018-10-24 20:32:34 UTC

Here is one more from another identical machine:

Mem: 1237M Active, 10G Inact, 15G Laundry, 3591M Wired, 846M Free
ARC: 81M Total, 869K MFU, 46M MRU, 32K Anon, 491K Header, 34M Other
4664K Compressed, 43M Uncompressed, 9.36:1 Ratio
Swap: 32G Total, 1152M Used, 31G Free, 3% Inuse

and one unaffected:

Mem: 392M Active, 9621M Inact, 136K Laundry, 6106M Wired, 15G Free
ARC: 110M Total, 2321K MFU, 54M MRU, 165K Anon, 643K Header, 53M Other
6252K Compressed, 50M Uncompressed, 8.25:1 Ratio
Swap: 32G Total, 11M Used, 32G Free

All machines are identical and are running same single active process.

There are no memory leaks in the process code (at least not through the
mmap).

Size of all "Active" allocations is roughly 1.5-2GB on all machines.

Post by Robert
Sorry, that wasn't my output, mine (related to the screenshot I've
Mem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M Other
6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
Swap: 32G Total, 15G Used, 17G Free, 46% Inuse
I'm OK with a low "Free" memory if OS can effectively allocate from
"Inactive",
but I'm worrying about a sudden move of a huge piece of memory into
"Swap" without any relevant mmap calls.
My question is: what else (except mmap) may reduce "Free" memory and
increase "Laundry"\"Swap" in the system?
Thanks.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard via freebsd-hackers

2018-10-24 20:58:14 UTC

No screen shot made it through the list back out to those that
get messages from the freebsd-hackers at freebsd.org reference
in the CC. The list limits itself to text as I understand.

Post by Robert
Mem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M Other
6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
Swap: 32G Total, 15G Used, 17G Free, 46% Inuse

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.

Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.

Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.

Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.

So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.

I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Robert
I'm OK with a low "Free" memory if OS can effectively allocate from "Inactive",
but I'm worrying about a sudden move of a huge piece of memory into "Swap" without any relevant mmap calls.
My question is: what else (except mmap) may reduce "Free" memory and increase "Laundry"\"Swap" in the system?
Thanks.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Robert

2018-10-26 22:07:07 UTC

Sorry, let me be more specific.

Please look into:
https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing
(wait until charts fully loaded).

These are all memory states and mmap\munmap stats collected. Y axis is
in MBs, X is a timeline.

It's not a problem to understand which process produces allocations and
is being swapped. I know this for sure.

The issue is: I strongly believe that by some reason FreeBSD kernel
fails to reuse deallocated memory properly.

Looking into graphs we can see following:

1. When process allocates memory (mmap), "Active" memory increases,
"Free" memory decreases (that's expected).

2. When process deallocates memory (munmap), "Inactive" memory
increases, "Active" memory decreases.

Memory never returns into "Free" state. That's kind of expected as well.

3. At some point, when sum of "Active" and "Inactive" memory exceeds
some upper memory limits,

OS starts to push "Inactive" memory into "Laundry" and "Swap". This
happens very quick and unexpectedly.

Now why OS doesn't reuse huge amounts of "Inactive" memory when calling
mmap?

Or my assumption about availability of "Inactive" memory is wrong? Which
one is free for allocations then?

Thanks.

Post by Mark Millard via freebsd-hackers
No screen shot made it through the list back out to those that
get messages from the freebsd-hackers at freebsd.org reference
in the CC. The list limits itself to text as I understand.

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard via freebsd-hackers

2018-10-27 02:40:09 UTC

This post might be inappropriate. Click to display it.

Robert

2018-10-27 05:36:18 UTC

Hi Mark, thanks for you reply.

Regarding memory flags - I'm not setting them directly anywhere.

Initially my app allocates shared memory segment of size RAM \ 2:

namespace bip =boost::interprocess;

bip::managed_shared_memory(bip::open_or_create,SHMEM_SEG_NAME, size)

and never resizes\deallocates it.

Later it uses only "new"-s and "mallocs".

When I watched for flags in DTRACE, they were all like below:

2018 Oct 20 11:24:48 mmap args: addr:0 len:2097152 prot:3 flags:1002 fd:-1 offset:0, which is:

#define MAP_PRIVATE 0x0002 /* changes are private */
and #define MAP_ANON 0x1000 /* allocated from memory, swap space */ This is what alloc\new generates by default.

Btw, if you look into mmap\munmap sizes you will notice it generates 3-5 GB more mmaps than munmaps every 5 minutes.
This means machine should be out of memory within an hour. But it doesn't happen so fast...
Why DTRACE lies here? Or is it because DTRACE can't catch all munmaps due to optimizations in kernel code, which was discussed recently in mailing list?

Anyway, from your response I understood that using MADV_FREE may help.
Any idea of how to use it properly? Should I call madvise after every free\delete on the "freed" pages? This sounds like something completely wrong...

Post by Robert
Sorry, let me be more specific.
Please look into: https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing (wait until charts fully loaded).

Thanks for giving folks access to the charts originally referred to.

Post by Robert
These are all memory states and mmap\munmap stats collected. Y axis is in MBs, X is a timeline.

MAP_PRIVATE in use? Vs.: MAP_SHARED in use?
MAP_NOSYNC in use or not?
MAP_ANON in use or not?
MAP_FIXED in use or not? (Probably not?)
But I cover MAP_NOSYNC and another option that is
in a 3rd call below and do not need such information
for what I've written.

Post by Robert
It's not a problem to understand which process produces allocations and is being swapped. I know this for sure.
The issue is: I strongly believe that by some reason FreeBSD kernel fails to reuse deallocated memory properly.
1. When process allocates memory (mmap), "Active" memory increases, "Free" memory decreases (that's expected).
2. When process deallocates memory (munmap), "Inactive" memory increases, "Active" memory decreases.
Memory never returns into "Free" state. That's kind of expected as well.

From the description of MAP_NOSYNC for mmap
. . . Without this option any VM
pages you dirty may be flushed to disk every so often
(every 30-60 seconds usually) which can create perfor-
mance problems if you do not need that to occur (such
as when you are using shared file-backed mmap regions
for IPC purposes). Dirty data will be flushed auto-
matically when all mappings of an object are removed
and all descriptors referencing the object are closed.
Note that VM/file system coherency is maintained
whether you use MAP_NOSYNC or not.
Note the specified behavior for flushing out "dirty data"
unless MAP_NOSYNC is in use. (I note another alternative
later.)
As I understand it FreeBSD uses the swapping/paging code to do the
flush activity: part of the swap/page space is mapped into the
the file in question and the flushing is a form of swapping/paging
out pages.
[Note: Top does not keep track of changes in swap space,
for example a "swapon" done after top has started
displaying things will not show an increased swap total
but the usage can show larger than the shown total.
Flushing out to a mapped file might be an example of
this for all I know.]
. . . The fsync(2) system call will flush all dirty data and
metadata associated with a file, including dirty
NOSYNC VM data, to physical media. The sync(8)
command and sync(2) system call generally do not
flush dirty NOSYNC VM data. The msync(2) system
call is
usually not needed since BSD implements a coherent
file system buffer cache. However, it may be used to
associate dirty VM pages with file system buffers and
thus cause them to be flushed to physical media sooner
rather than later.
As for munmap: its description is that the address range is still
The munmap () system call deletes the mappings and guards for the speci-
fied address range, and causes further references to addresses within the
range to generate invalid memory references.
That last is not equivalent to the address range being "free"
in that the range still counts against the process address space.
(So being accurate about what is about RAM availability vs. address
space usage/availability is important in order to avoid confusions.)
It would appear that to force invalid memory references involves
keeping page descriptions around but they would be inactive,
rather than active. This is true no matter if RAM is still associated
or not. (So this could potentially lead to a form of extra counting
of RAM use, sort of like in my original note.) See later below for
another means of control . . .
Remember: "Dirty data will be flushed automatically when all mappings of
an object are removed and all descriptors referencing the object are
closed". So without MAP_NOSYNC the flushing is expected. But see below
for another means of control . . .
There is another call madvise that has an option tied
to enabling freeing pages and avoiding flushing the
MADV_FREE Gives the VM system the freedom to free pages, and tells
the system that information in the specified page range
is no longer important. This is an efficient way of
allowing malloc(3) to free pages anywhere in the address
space, while keeping the address space valid. The next
time that the page is referenced, the page might be
demand zeroed, or might contain the data that was there
before the MADV_FREE call. References made to that
address space range will not make the VM system page the
information back in from backing store until the page is
modified again.
This is a way to let the system free page ranges and
allow later use of the address range in the process's
address space. No descriptions of page ranges that should
generate invalid memory references, so no need of such
"inactive pages".
MADV_FREE makes clear that your expectations of the meaning
MADV_FREE must be explicit to get the behavior you appear
to be looking for. At least that is the way I read the
documentation's meaning. MAP_NOSYNC does not seem sufficient
for matching what you are looking for as the behavioral
properties --but it appears possibly necessary up to when
MADV_FREE can be used.

Post by Robert
3. At some point, when sum of "Active" and "Inactive" memory exceeds some upper memory limits,
OS starts to push "Inactive" memory into "Laundry" and "Swap". This happens very quick and unexpectedly.

This is the flushing activity documented as far as I can tell.

Post by Robert
Now why OS doesn't reuse huge amounts of "Inactive" memory when calling mmap?

Without MADV_FREE use the system does not have "the freedom
to free pages". Without MAP_NOSYNC as well it is expected
to flush out some pages at various times as things go along.

Post by Robert
Or my assumption about availability of "Inactive" memory is wrong? Which one is free for allocations then?

Pages that are inactive and dirty, normally have to be
flushed out before the RAM for the page can be freed
for other uses. MADV_FREE is for indicating when this is
not the case and the usage of the RAM has reach a stage
where the RAM can be more directly freed (no longer tied
to the process).
At least that is my understanding.
Mark Johnston had already written about MADV_FREE but not
with such quoting of related documentation. If he and I
seem to contradict each other anywhere, believe Mark J.
I'm no FreeBSD expert. I'm just trying to reference and
understand the documentation.

Post by Robert
Thanks.

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

Mark Millard via freebsd-hackers

2018-10-27 07:45:48 UTC

Post by Robert
Hi Mark, thanks for you reply.
Regarding memory flags - I'm not setting them directly anywhere.
namespace bip = boost::interprocess;
bip::managed_shared_memory(bip::open_or_create, SHMEM_SEG_NAME
, size)
and never resizes\deallocates it.
Later it uses only "new"-s and "mallocs".

Ahh. I was thinking that you were using mmap and munmap directly.

new/delete and malloc/free may well manage pool of memory areas
and not allocate or un-allocate at the OS level 1-to-1 with the
new/malloc or delete/free calls. There is a whole extra layer
of policy/management involved. There are likely far fewer
mmaps than new's+malloc's and far fewer munmap's than
delete's+free's.

The documentation indicates that the destructor for
bip::managed_shared_memory does an unmap of some
kind and frees resources --but does not remove the named
object from the system. Looking around this appears to
mean that a backing-store file is left around and reused
(for the same memory segment name). For FreeBSD it does
appear to use mmap/munmap and the default_map_options
lead to MAP_NOSYNC use. I saw no evidence of MADV_FREE
use. But there is code involving MADV_DONTNEED for UNIXes
that do not have destructive semantics for it, FreeBSD
being indicated as one as I remember.

Post by Robert
#
define MAP_PRIVATE 0x0002 /* changes are private */
and
#
define MAP_ANON 0x1000 /* allocated from memory, swap space
*/
This is what alloc\new generates by default.

I see.

Post by Robert
Btw, if you look into mmap\munmap sizes you will notice it generates 3-5 GB more mmaps than munmaps every 5 minutes.
This means machine should be out of memory within an hour.

Fragmentation of managed areas such that having even one part of
an area in use means that the area can not be given back to the
OS?

Post by Robert
But it doesn't happen so fast...
Why DTRACE lies here? Or is it because DTRACE can't catch all munmaps due to optimizations in kernel code, which was discussed recently in mailing list?

I've no evidence that DTRACE is wrong here: there may have been new's without
matching delete's or malloc's without matching free's at the time. In such
cases there would be fewer unmaps than maps at the time in order to preserve
what is still allocated.

Of course, I've no evidence for the other direction either.

Post by Robert
Anyway, from your response I understood that using MADV_FREE may help.
Any idea of how to use it properly? Should I call madvise after every free\delete on the "freed" pages? This sounds like something completely wrong...

Normal new/delete and malloc/free use would not use MADV_FREE:
explicit page management is not visible at those programming
interfaces.

It appears that I have way to little context to be of much help.

Post by Robert

Post by Robert
Sorry, let me be more specific.
https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing
(wait until charts fully loaded).

Thanks for giving folks access to the charts originally referred to.

Post by Robert
These are all memory states and mmap\munmap stats collected. Y axis is in MBs, X is a timeline.

This is the flushing activity documented as far as I can tell.

Post by Robert
Now why OS doesn't reuse huge amounts of "Inactive" memory when calling mmap?

Without MADV_FREE use the system does not have "the freedom
to free pages". Without MAP_NOSYNC as well it is expected
to flush out some pages at various times as things go along.

Post by Robert
Or my assumption about availability of "Inactive" memory is wrong? Which one is free for allocations then?

Post by Robert
Thanks.

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700
Robert

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Konstantin Belousov

2018-10-27 04:38:19 UTC

Post by Robert
Sorry, let me be more specific.
https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing
(wait until charts fully loaded).
These are all memory states and mmap\munmap stats collected. Y axis is
in MBs, X is a timeline.
It's not a problem to understand which process produces allocations and
is being swapped. I know this for sure.
The issue is: I strongly believe that by some reason FreeBSD kernel
fails to reuse deallocated memory properly.
1. When process allocates memory (mmap), "Active" memory increases,
"Free" memory decreases (that's expected).

No, mmap(2) call only causes some small amount of the kernel memory
allocation, to track the mmaped region.

Pages are allocated on the first touch, lets ignore the prefaulting
mechanism which is not too relevant for the discussion. Mapped page
can be either active or inactive, this is determined by the usage
history.

Post by Robert
2. When process deallocates memory (munmap), "Inactive" memory
increases, "Active" memory decreases.

Again no. Unmapped pages does not get active references from userspace
which touch them, so eventually pagedaemon moves active unmapped memory
to inactive.

Post by Robert
Memory never returns into "Free" state. That's kind of expected as well.

When the last mapping of the anonymous swap-backed page is destroyed,
there is no point in retaining the page content, so it is freed.
For the pages backed by the file, there is no point to drop content
which we already paid for by read.

Post by Robert
3. At some point, when sum of "Active" and "Inactive" memory exceeds
some upper memory limits,
OS starts to push "Inactive" memory into "Laundry" and "Swap". This
happens very quick and unexpectedly.

So this is the pattern of behaviour of your applications.

Post by Robert
Now why OS doesn't reuse huge amounts of "Inactive" memory when calling
mmap?

Because typically active and inactive pages carry some useful content,
which cannot be lost. Before reusing the page, pagedaemon must ensure
that the page is written to file for file-backed mappings, or to swap
for anonymous mappings, so its data can be restored when needed again.

If the page content is synced with the permanent storage, it is called
clean, otherwise dirty. Reuse of dirty page requires changing its state
to clean by write-out. Laundry is the queue where the dirty pages
sit waiting for write, this is done to not clog inactive queue since
the page was already processed by the page daemon and decided for laundry.

Post by Robert
Or my assumption about availability of "Inactive" memory is wrong? Which
one is free for allocations then?

Yes, you assumptions are wrong. Active/inactive represent the
usage/reference history, not the reusability (clean/dirty) state.

Post by Robert
Thanks.

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers

Robert

2018-10-29 22:45:25 UTC

Post by Konstantin Belousov

Hi Konstantin, thanks for your reply.

As per your response, both active and inactive regions are actually
allocated to the user app,

means my app allocated (and haven't freed) more than available amount of
RAM...

Does that mean there is basically a memory leak in the app code?

Could you recommend a good instrument to find and catch a memory leak in
a FreeBSD?

Thanks.

No, mmap(2) call only causes some small amount of the kernel memory
allocation, to track the mmaped region.
Pages are allocated on the first touch, lets ignore the prefaulting
mechanism which is not too relevant for the discussion. Mapped page
can be either active or inactive, this is determined by the usage
history.

Post by Robert
2. When process deallocates memory (munmap), "Inactive" memory
increases, "Active" memory decreases.

Again no. Unmapped pages does not get active references from userspace
which touch them, so eventually pagedaemon moves active unmapped memory
to inactive.

Post by Robert
Memory never returns into "Free" state. That's kind of expected as well.

Post by Robert
3. At some point, when sum of "Active" and "Inactive" memory exceeds
some upper memory limits,
OS starts to push "Inactive" memory into "Laundry" and "Swap". This
happens very quick and unexpectedly.

So this is the pattern of behaviour of your applications.

Post by Robert
Now why OS doesn't reuse huge amounts of "Inactive" memory when calling
mmap?

Because typically active and inactive pages carry some useful content,
which cannot be lost. Before reusing the page, pagedaemon must ensure
that the page is written to file for file-backed mappings, or to swap
for anonymous mappings, so its data can be restored when needed again.
If the page content is synced with the permanent storage, it is called
clean, otherwise dirty. Reuse of dirty page requires changing its state
to clean by write-out. Laundry is the queue where the dirty pages
sit waiting for write, this is done to not clog inactive queue since
the page was already processed by the page daemon and decided for laundry.

Post by Robert
Or my assumption about availability of "Inactive" memory is wrong? Which
one is free for allocations then?

Yes, you assumptions are wrong. Active/inactive represent the
usage/reference history, not the reusability (clean/dirty) state.

Post by Robert
Thanks.

Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.

Post by Rozhuk Ivan
On Wed, 24 Oct 2018 10:19:20 -0700

Post by Robert
So the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .

+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free

Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers

Konstantin Belousov

2018-10-30 01:20:12 UTC

Post by Robert
Hi Konstantin, thanks for your reply.
As per your response, both active and inactive regions are actually
allocated to the user app,

Not necessary. E.g. the pages which hold the data for some files are
accounted there.

Post by Robert
means my app allocated (and haven't freed) more than available amount of
RAM...
Does that mean there is basically a memory leak in the app code?

It is too far reaching to state that there is a leak only seeing the
the large number of pages becoming dirty. Leak means that memory is
consumed and not recycled after becoming useless.

Rozhuk Ivan

2018-11-05 22:21:07 UTC

Post by Konstantin Belousov

On Sat, 27 Oct 2018 07:38:19 +0300

Post by Robert
1. When process allocates memory (mmap), "Active" memory increases,
"Free" memory decreases (that's expected).

No, mmap(2) call only causes some small amount of the kernel memory
allocation, to track the mmaped region.
Pages are allocated on the first touch, lets ignore the prefaulting
mechanism which is not too relevant for the discussion. Mapped page
can be either active or inactive, this is determined by the usage
history.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195882
This is still a proublem.

I have 12.0b3 in vbox.
16 Gb hdd, 4 Gb RAM.
Free 6+Gb on hdd.
Build app to write 6 Gb.

run, it was killed
Nov 5 21:05:09 firewall kernel: pid 96603 (testvm), uid 0, was killed: out of swap space
Nov 5 21:05:15 firewall kernel: Nov 5 21:05:09 firewall kernel: pid 96603 (testvm), uid 0, was killed: out of swap space

Mem: 50M Active, 3331M Inact, 192M Laundry, 362M Wired, 208M Buf, 212M Free
Swap: 16G Total, 151M Used, 16G Free

Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/gptid/6f42a7d9-45b9-11e8-8fd8-0800278fa2e3 17G 14G 2.4G 85% 768k 1.4M 35% /

1. Program does not do it job / killed, but system have more than enough swap.
2. mem does not free after all disk io done.
It free only after file deleted or readed.

Also there is more strange behaviour if swap file disabled.

Mark Millard via freebsd-hackers

2018-11-05 23:04:38 UTC

This post might be inappropriate. Click to display it.

RW via freebsd-hackers

2018-11-06 00:53:47 UTC

On Mon, 5 Nov 2018 15:04:38 -0800

Post by Rozhuk Ivan
run, it was killed
Nov 5 21:05:09 firewall kernel: pid 96603 (testvm), uid 0, was
killed: out of swap space Nov 5 21:05:15 firewall kernel: Nov 5
out of swap space

Unfortunately, the wording of this message is a misnomer for what
drives the kills: it is actually driven by being unable to gain more
free memory but FreeBSD will not swap-out processes that stay runnable
(or are running), only ones that are waiting.

When you say it wont swap-out processes, do you mean that literally, or
do you mean it wont page-out from runable processes? Swapping processes
shouldn't be an essential part of recovering memory, Linux doesn't even
support it.

Mark Millard via freebsd-hackers

2018-11-06 01:30:42 UTC

Post by RW via freebsd-hackers
On Mon, 5 Nov 2018 15:04:38 -0800

Post by Rozhuk Ivan
run, it was killed
Nov 5 21:05:09 firewall kernel: pid 96603 (testvm), uid 0, was
killed: out of swap space Nov 5 21:05:15 firewall kernel: Nov 5
out of swap space

Paging is a different issue as I understand. Without paging,
no process could have a memory area bigger than RAM.

But, going in a different direction . . .

Turns out the example program in bugzilla 195882 is interesting
independent of any worries about kills or swap space use:

I ran an a.out built from the source but with 16 instead of 4
as the size controlling constant, in order use 16 GiByte for
an 8 GiByte aarch64 system.

Before the a.out run (after a fresh boot, form top):

Mem: 13M Active, 1240K Inact, 108M Wired, 28M Buf, 7757M Free
Swap: 28G Total, 28G Free

After the a.out had finished, from a separate top
run:

Mem: 2197M Active, 4937M Inact, 255M Wired, 186M Buf, 629M Free
Swap: 28G Total, 11M Used, 28G Free

No actual process (or group of them) was running using a large
amount of RAM at that point. Active and Inact are rather large.

With the system left idle, 15 minutes later or more, the
Active and Inact had not changed significantly, nor had Free.
The figures do seem odd.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

RW via freebsd-hackers

2018-11-06 02:04:54 UTC

On Mon, 5 Nov 2018 17:30:42 -0800

Post by RW via freebsd-hackers

Post by Mark Millard via freebsd-hackers
Unfortunately, the wording of this message is a misnomer for what
drives the kills: it is actually driven by being unable to gain
more free memory but FreeBSD will not swap-out processes that stay
runnable (or are running), only ones that are waiting.

When you say it wont swap-out processes, do you mean that
literally, or do you mean it wont page-out from runable processes?
Swapping processes shouldn't be an essential part of recovering
memory, Linux doesn't even support it.

Paging is a different issue as I understand. Without paging,
no process could have a memory area bigger than RAM.

Modern VM manages memory at the page level, and paging replaced true
process swapping a long time ago.

In FreeBSD a form of swapping was recreated on top of paging. Under low
memory conditions the recovery of memory from whole processes is
accelerated allowing more memory into the running/runable processes.
This is a relatively minor tweak to ordinary paging - AFAIK it shouldn't
prevent normal memory management at the page level.

Rozhuk Ivan

2018-11-06 01:39:35 UTC

On Mon, 5 Nov 2018 15:04:38 -0800

Post by Rozhuk Ivan
Mem: 50M Active, 3331M Inact, 192M Laundry, 362M Wired, 208M Buf,
212M Free Swap: 16G Total, 151M Used, 16G Free

Without extra evidence, do not beleive the "out of swap space"
part of "killed: out of swap space".
But it turns out there is a tunable setting to control how many
tries at freeing memory before kills happen: so, indirectly,
how "long" before the kills will start under sustained low
free RAM conditions.

I undestand that real kill reason is not enough free RAM, not swap.

Post by Mark Millard via freebsd-hackers
The default vm.pageout_oom_seq=12 can be increased
to increase how long a low-free-RAM condition is tolerated.
I assign vm.pageout_oom_seq in /etc/sysctl.conf --but that may
not be the best for your context.

I do not undestand why system keep that crap in my RAM (making big disk cache)
and does not return it back, and prefer to move some other process/mem to swap.
I want limit disk cache size / durty pages via sysctl, or let it scale to all free mem but if some app need memory - it can take from that cache.
For now I see that system does not like to give to apps that mem.

FreeBSD has worst mem system design for user purposes, linux and windiws mush better.
Only FreeBSD may stick because all 32GB mem used by something (not user processes) and try to start swapping.
I never use swap with windows and linux, I always know who eat my mem.

I try many sysctl tunings but without success - disk cache/durty pages grow and system freeze (if swap is off).
I see some where in past some patch to VM system that more active free mem, but lost link and not try it. :(
Laundary and some other improvenets from FreeBSD 10 to 11.2 is big step, but still have no confort in use.

Rozhuk Ivan

2018-10-24 21:54:29 UTC