Post by Mark Millard via freebsd-hackersPost by RobertSorry, let me be more specific.
https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing
(wait until charts fully loaded).
Thanks for giving folks access to the charts originally referred to.
Post by RobertThese are all memory states and mmap\munmap stats collected. Y axis is in MBs, X is a timeline.
MAP_PRIVATE in use? Vs.: MAP_SHARED in use?
MAP_NOSYNC in use or not?
MAP_ANON in use or not?
MAP_FIXED in use or not? (Probably not?)
But I cover MAP_NOSYNC and another option that is
in a 3rd call below and do not need such information
for what I've written.
Post by RobertIt's not a problem to understand which process produces allocations and is being swapped. I know this for sure.
The issue is: I strongly believe that by some reason FreeBSD kernel fails to reuse deallocated memory properly.
1. When process allocates memory (mmap), "Active" memory increases, "Free" memory decreases (that's expected).
2. When process deallocates memory (munmap), "Inactive" memory increases, "Active" memory decreases.
Memory never returns into "Free" state. That's kind of expected as well.
From the description of MAP_NOSYNC for mmap
. . . Without this option any VM
pages you dirty may be flushed to disk every so often
(every 30-60 seconds usually) which can create perfor-
mance problems if you do not need that to occur (such
as when you are using shared file-backed mmap regions
for IPC purposes). Dirty data will be flushed auto-
matically when all mappings of an object are removed
and all descriptors referencing the object are closed.
Note that VM/file system coherency is maintained
whether you use MAP_NOSYNC or not.
Note the specified behavior for flushing out "dirty data"
unless MAP_NOSYNC is in use. (I note another alternative
later.)
As I understand it FreeBSD uses the swapping/paging code to do the
flush activity: part of the swap/page space is mapped into the
the file in question and the flushing is a form of swapping/paging
out pages.
[Note: Top does not keep track of changes in swap space,
for example a "swapon" done after top has started
displaying things will not show an increased swap total
but the usage can show larger than the shown total.
Flushing out to a mapped file might be an example of
this for all I know.]
. . . The fsync(2) system call will flush all dirty data and
metadata associated with a file, including dirty
NOSYNC VM data, to physical media. The sync(8)
command and sync(2) system call generally do not
flush dirty NOSYNC VM data. The msync(2) system
call is
usually not needed since BSD implements a coherent
file system buffer cache. However, it may be used to
associate dirty VM pages with file system buffers and
thus cause them to be flushed to physical media sooner
rather than later.
As for munmap: its description is that the address range is still
The munmap () system call deletes the mappings and guards for the speci-
fied address range, and causes further references to addresses within the
range to generate invalid memory references.
That last is not equivalent to the address range being "free"
in that the range still counts against the process address space.
(So being accurate about what is about RAM availability vs. address
space usage/availability is important in order to avoid confusions.)
It would appear that to force invalid memory references involves
keeping page descriptions around but they would be inactive,
rather than active. This is true no matter if RAM is still associated
or not. (So this could potentially lead to a form of extra counting
of RAM use, sort of like in my original note.) See later below for
another means of control . . .
Remember: "Dirty data will be flushed automatically when all mappings of
an object are removed and all descriptors referencing the object are
closed". So without MAP_NOSYNC the flushing is expected. But see below
for another means of control . . .
There is another call madvise that has an option tied
to enabling freeing pages and avoiding flushing the
MADV_FREE Gives the VM system the freedom to free pages, and tells
the system that information in the specified page range
is no longer important. This is an efficient way of
allowing malloc(3) to free pages anywhere in the address
space, while keeping the address space valid. The next
time that the page is referenced, the page might be
demand zeroed, or might contain the data that was there
before the MADV_FREE call. References made to that
address space range will not make the VM system page the
information back in from backing store until the page is
modified again.
This is a way to let the system free page ranges and
allow later use of the address range in the process's
address space. No descriptions of page ranges that should
generate invalid memory references, so no need of such
"inactive pages".
MADV_FREE makes clear that your expectations of the meaning
MADV_FREE must be explicit to get the behavior you appear
to be looking for. At least that is the way I read the
documentation's meaning. MAP_NOSYNC does not seem sufficient
for matching what you are looking for as the behavioral
properties --but it appears possibly necessary up to when
MADV_FREE can be used.
Post by Robert3. At some point, when sum of "Active" and "Inactive" memory exceeds some upper memory limits,
OS starts to push "Inactive" memory into "Laundry" and "Swap". This happens very quick and unexpectedly.
This is the flushing activity documented as far as I can tell.
Post by RobertNow why OS doesn't reuse huge amounts of "Inactive" memory when calling mmap?
Without MADV_FREE use the system does not have "the freedom
to free pages". Without MAP_NOSYNC as well it is expected
to flush out some pages at various times as things go along.
Post by RobertOr my assumption about availability of "Inactive" memory is wrong? Which one is free for allocations then?
Pages that are inactive and dirty, normally have to be
flushed out before the RAM for the page can be freed
for other uses. MADV_FREE is for indicating when this is
not the case and the usage of the RAM has reach a stage
where the RAM can be more directly freed (no longer tied
to the process).
At least that is my understanding.
Mark Johnston had already written about MADV_FREE but not
with such quoting of related documentation. If he and I
seem to contradict each other anywhere, believe Mark J.
I'm no FreeBSD expert. I'm just trying to reference and
understand the documentation.
Post by RobertThanks.
Post by Mark Millard via freebsd-hackersNo screen shot made it through the list back out to those that
get messages from the freebsd-hackers at freebsd.org reference
in the CC. The list limits itself to text as I understand.
Post by RobertMem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M Other
6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
Swap: 32G Total, 15G Used, 17G Free, 46% Inuse
Relative to my limited point: I do not know the status of
mutually-exclusive categorizations vs. not for ZFS ARC and
Mem.
Unfortunately, as I understand things, it is questionable if
adding -S to the top command gives you swap information that
can point to what makes up the 15G swapped out by totaling
the sizes listed. But you might at least be able to infer
what processes became swapped out even if you can not get
a good size for the swap space use for each.
Using -ores does seem like it puts the top users of resident
memory at the top of top's process list.
Sufficient Active RAM use by processes that stay active will
tend to cause inactive processes to be swapped out. FreeBSD
does not swap out processes that stay active: it pages those
as needed instead of swapping.
So using top -Sores might allow watching what active
process(es) grow and stay active and what inactive processes
are swapped out at the time of the activity.
I do infer that the 15G Used for Swap is tied to processes
that were not active when swapped out.
Post by RobertI'm OK with a low "Free" memory if OS can effectively allocate from "Inactive",
but I'm worrying about a sudden move of a huge piece of memory into "Swap" without any relevant mmap calls.
My question is: what else (except mmap) may reduce "Free" memory and increase "Laundry"\"Swap" in the system?
Thanks.
Post by Mark Millard via freebsd-hackersPost by Rozhuk IvanOn Wed, 24 Oct 2018 10:19:20 -0700
Robert
Post by RobertSo the issue is still happening. Please check attached screenshot.
The green area is "inactive + cached + free".
. . .
+1
Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
Swap: 112G Total, 19M Used, 112G Free
Just a limited point based on my understanding of "Buf" in
top's display . . .
If "cached" means "Buf" in top's output, my understanding of Buf
is that it is not a distinct memory area. Instead it totals the
buffer space that is spread across multiple states: Active,
Inactive, Laundry, and possibly Wired(?).
In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
If Buf is added to that then there is double counting of
everything included in Buf and the total will be larger
than the TotalMemory.
Also Inact+Buf+Free may double count some of the Inact space,
the space that happens to be inactive buffer space.
I may be wrong, but that is my understanding.