PostgresSQL vs super pages

Post by Thomas Munro
shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e)
ftruncate(46,0x400000) = 0 (0x0)

Try to write zeroes instead of truncating.
This should activate the fast path in the fault handler, and if the
pages allocated for backing store of the shm object were from reservation,
you should get superpage mapping on the first fault without promotion.

Post by Thomas Munro
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_SHARED|MAP_HASSEMAPHORE|MAP_NOSYNC|MAP_ALIGNED_SUPER,46,0x0)
= 35081158656 (0x82b000000)
close(46) = 0 (0x0)

Thomas Munro

2018-10-11 01:01:20 UTC

Post by Thomas Munro
shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e)
ftruncate(46,0x400000) = 0 (0x0)

If you just write() to a newly shm_open()'d fd you get a return code
of 0 so I assume that doesn't work. If you ftruncate() to the desired
size first, then loop writing 8192 bytes of zeroes at a time, it
works. But still no super pages. I tried also with a write buffer of
2MB of zeroes, but still no super pages. I tried abandoning
shm_open() and instead using a mapped file, and still no super pages.

Konstantin Belousov

2018-10-13 23:50:21 UTC

Post by Thomas Munro
shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e)
ftruncate(46,0x400000) = 0 (0x0)

I did not quite scientific experiment, but you would need to try to find
the differences between what I did and what you observe. Below is the
naive test program that directly implements my suggestion, and the
output from the procstat -v for it after all things were set up.

/* $Id: shm_super.c,v 1.1 2018/10/13 23:49:37 kostik Exp kostik $ */

#include <sys/param.h>
#include <sys/mman.h>
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define M(x) ((x) * 1024 * 1024)
#define SZ M(4)

int
main(void)
{
char buf[128];
void *ptr;
off_t cnt;
int error, shmfd;

shmfd = shm_open(SHM_ANON, O_CREAT | O_RDWR, 0600);
if (shmfd == -1)
err(1, "shm_open");
error = ftruncate(shmfd, SZ);
if (error == -1)
err(1, "truncate");
memset(buf, 0, sizeof(buf));
for (cnt = 0; cnt < SZ; cnt += sizeof(buf)) {
error = write(shmfd, buf, sizeof(buf));
if (error == -1)
err(1, "write");
else if (error != sizeof(buf))
errx(1, "short write %d", (int)error);
}
ptr = mmap(NULL, SZ, PROT_READ | PROT_WRITE, MAP_SHARED |
MAP_ALIGNED_SUPER, shmfd, 0);
if (ptr == MAP_FAILED)
err(1, "mmap");
for (cnt = 0; cnt < SZ; cnt += PAGE_SIZE)
*((char *)ptr + cnt) = 0;
printf("ptr %p\n", ptr);
snprintf(buf, sizeof(buf), "procstat -v %d", getpid());
system(buf);
}

$ ./shm_super
ptr 0x800e00000
PID START END PRT RES PRES REF SHD FLAG TP PATH
98579 0x400000 0x401000 r-x 1 3 1 0 CN-- vn /usr/home/kostik/work/build/bsd/DEV/stuff/tests/shm_super
98579 0x600000 0x601000 rw- 1 1 1 0 ---- df
98579 0x800600000 0x800620000 r-x 32 34 146 72 CN-- vn /libexec/ld-elf.so.1
98579 0x800620000 0x800644000 rw- 24 24 1 0 ---- df
98579 0x80081f000 0x800820000 rw- 1 0 1 0 C--- vn /libexec/ld-elf.so.1
98579 0x800820000 0x800821000 rw- 1 1 1 0 ---- df
98579 0x800821000 0x8009b3000 r-x 402 440 146 72 CN-- vn /lib/libc.so.7
98579 0x8009b3000 0x800bb3000 --- 0 0 0 0 CN-- --
98579 0x800bb3000 0x800bbf000 rw- 12 0 1 0 C--- vn /lib/libc.so.7
98579 0x800bbf000 0x800bd9000 rw- 5 14 2 0 ---- df
98579 0x800c00000 0x800e00000 rw- 9 14 2 0 ---- df
98579 0x800e00000 0x801200000 rw- 1024 1030 3 0 --S- df
98579 0x801200000 0x801400000 rw- 6 1030 3 0 ---- df
98579 0x7fffdffff000 0x7ffffffdf000 --- 0 0 0 0 ---- --
98579 0x7ffffffdf000 0x7ffffffff000 rw- 4 4 1 0 ---D df
98579 0x7ffffffff000 0x800000000000 r-x 1 1 81 0 ---- ph

Thomas Munro

2018-10-14 09:58:08 UTC

Post by Thomas Munro
shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e)
ftruncate(46,0x400000) = 0 (0x0)

...

Post by Konstantin Belousov
98579 0x800e00000 0x801200000 rw- 1024 1030 3 0 --S- df

Huh. Your program doesn't result in an S mapping on my laptop, but I
tried on an EC2 t2.2xlarge machine and there it promotes to S, even if
I comment out the write() loop (the loop that assigned to every byte
is enough). The difference might be the amount of memory on the
system: on my 4GB laptop, it is very reluctant to use super pages (but
I have seen it do it, so I know it can). On a 32GB system, it does it
immediately, and it works nicely for PostgreSQL too. So perhaps my
problem is testing on a small RAM system, though I don't understand
why.

Konstantin Belousov

2018-10-14 11:45:44 UTC

Post by Thomas Munro
shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e)
ftruncate(46,0x400000) = 0 (0x0)

...

Post by Konstantin Belousov
98579 0x800e00000 0x801200000 rw- 1024 1030 3 0 --S- df

How many free memory does your system have ? Free as reported by top. If
the free memory is low and fragmented, and I suppose it is on 4G laptop
which you use with X, browser and other memory-consuming applications,
system would have troubles filling the reverve, i.e reserving 2M of
2M-aligned physical pages.

You can try the test programs right after booting into single user mode.

Thomas Munro

2018-10-14 22:42:15 UTC