Elena Mihailescu
2018-07-30 11:56:27 UTC
Hello,
I'll start by giving some context before asking my questions.
Currently, we are trying to implement a live migration feature for
bhyve. In order to do that, we want to mark the guest memory
copy-on-write. As we've previously discussed this problem on the
freebsd-virtualization list [1], the vm_entry structure that contains
the guest memory needs to have the MAP_ENTRY_COW and
MAP_ENTRY_NEEDS_COPY flags set. The way the migration mechanism will
work in "rounds" is described bellow (it is a pre-copy live migration
technique):
- in the first round, after starting the procedure and checking the
compatibility of the two systems, the entire guest memory will be sent at the
destination
- in the second round, after the transfer is completed, only the
differences (pages that were written/dirtied since the first round
started) will be sent to the destination
...
- in the n-th round, only the differences between this round and the
round (n - 1) will be sent.
Before the last round, the guest will be stopped, and the remaining memory
will be sent along with the CPU state.
We need the COW mechanism to determine changes between rounds. As for
the number of rounds, there could be maximum 10 rounds. the value is set
without running any benchmarks to limit potential overhead). The
number of rounds will be decided later based on the test results. The
number of new objects on the source will be also not very high since
the number of steps is limited. After the migration process is
completed, the guest from the source system will be destroyed and
those new created object will be no longer needed (will be discarded).
----
As we could see by inspecting and debugging the bhyve code, the object
(currently, we are using a 512MB bhyve guest, for more physical memory
assigned to the virtual machine, there could be more objects) that
describes the guest memory, is pointed to by two vm_entry structures
from two different vmspace structures:
- the first one is the vmspace that describes host's virtual memory
- the second one is a separated vmspace structure created by the
hypervisor when creating the virtual machine.
I have several questions about the FreeBSD's memory management and
bhyve's internals because I wasn't able to determine this yet by
myself.
The first one is that if someone knows whether that object that
describes the guest memory is contained only by the two vm_entry
structures, or whether it is contained by other entries. We could not
find if it so or not.
As far as I could tell, the COW can be set only for vm_entry
structures. Is there a way to set as copy-on-write just certain pages
or maybe just the object and not its vm_entry structure? I want to
know if there is a more fine grain mechanism to set just parts of the
memory as COW.
We need a finer granularity when setting pages as copy-on-write
because we encountered some issues:
- virtio mechanisms are working by having a shared memory region
between host and guest and while transferring the guest memory state,
the pages that are involved in the virtio communication do not need
to be set COW.
- if we are trying to mark the vm_entry that contains the guest memory
as COW from the host vmspace, the virtio devices will crash the guest
eventually (some assertions about IOV and operation types will fail).
We know that we should not set that memory as copy-on-write
because it is not the way the guest sees its physical memory but,
- if we set the vm_entry that contains the object with guest memory as
COW from the dedicated vmspace created for the guest, the virtio
devices will not fail assertions anymore, but after some time it seems
that the guest filesystem is corrupted. Usually, we can start the
guest normally after entering in single user mode and running fsck.
Sometime, we need to install the virtual machine again.
Also, after setting the vm_entry from the guest dedicated vmspace as COW,
the two vm_entry will have different views of the guest memory:
- the vm_entry from the guest dedicated vmspace will point to a new
object (of course, after a first write access)
- the vm_entry from the host vmspace that contains the guest memory
will point to the old object that now has as backing object the new
created object.
Another question might be if it is ok to "change" the object from the
host's vm_entry to point to the backing object. In this case, the two
entries will point again to the same object. This might imply to
remap/redo the references contained by the old object to point to the
new object.
[1] http://freebsd.1045724.x6.nabble.com/Inspect-pages-created-after-a-vm-object-is-marked-as-copy-on-write-td6266552.html
Thank you,
Elena
I'll start by giving some context before asking my questions.
Currently, we are trying to implement a live migration feature for
bhyve. In order to do that, we want to mark the guest memory
copy-on-write. As we've previously discussed this problem on the
freebsd-virtualization list [1], the vm_entry structure that contains
the guest memory needs to have the MAP_ENTRY_COW and
MAP_ENTRY_NEEDS_COPY flags set. The way the migration mechanism will
work in "rounds" is described bellow (it is a pre-copy live migration
technique):
- in the first round, after starting the procedure and checking the
compatibility of the two systems, the entire guest memory will be sent at the
destination
- in the second round, after the transfer is completed, only the
differences (pages that were written/dirtied since the first round
started) will be sent to the destination
...
- in the n-th round, only the differences between this round and the
round (n - 1) will be sent.
Before the last round, the guest will be stopped, and the remaining memory
will be sent along with the CPU state.
We need the COW mechanism to determine changes between rounds. As for
the number of rounds, there could be maximum 10 rounds. the value is set
without running any benchmarks to limit potential overhead). The
number of rounds will be decided later based on the test results. The
number of new objects on the source will be also not very high since
the number of steps is limited. After the migration process is
completed, the guest from the source system will be destroyed and
those new created object will be no longer needed (will be discarded).
----
As we could see by inspecting and debugging the bhyve code, the object
(currently, we are using a 512MB bhyve guest, for more physical memory
assigned to the virtual machine, there could be more objects) that
describes the guest memory, is pointed to by two vm_entry structures
from two different vmspace structures:
- the first one is the vmspace that describes host's virtual memory
- the second one is a separated vmspace structure created by the
hypervisor when creating the virtual machine.
I have several questions about the FreeBSD's memory management and
bhyve's internals because I wasn't able to determine this yet by
myself.
The first one is that if someone knows whether that object that
describes the guest memory is contained only by the two vm_entry
structures, or whether it is contained by other entries. We could not
find if it so or not.
As far as I could tell, the COW can be set only for vm_entry
structures. Is there a way to set as copy-on-write just certain pages
or maybe just the object and not its vm_entry structure? I want to
know if there is a more fine grain mechanism to set just parts of the
memory as COW.
We need a finer granularity when setting pages as copy-on-write
because we encountered some issues:
- virtio mechanisms are working by having a shared memory region
between host and guest and while transferring the guest memory state,
the pages that are involved in the virtio communication do not need
to be set COW.
- if we are trying to mark the vm_entry that contains the guest memory
as COW from the host vmspace, the virtio devices will crash the guest
eventually (some assertions about IOV and operation types will fail).
We know that we should not set that memory as copy-on-write
because it is not the way the guest sees its physical memory but,
- if we set the vm_entry that contains the object with guest memory as
COW from the dedicated vmspace created for the guest, the virtio
devices will not fail assertions anymore, but after some time it seems
that the guest filesystem is corrupted. Usually, we can start the
guest normally after entering in single user mode and running fsck.
Sometime, we need to install the virtual machine again.
Also, after setting the vm_entry from the guest dedicated vmspace as COW,
the two vm_entry will have different views of the guest memory:
- the vm_entry from the guest dedicated vmspace will point to a new
object (of course, after a first write access)
- the vm_entry from the host vmspace that contains the guest memory
will point to the old object that now has as backing object the new
created object.
Another question might be if it is ok to "change" the object from the
host's vm_entry to point to the backing object. In this case, the two
entries will point again to the same object. This might imply to
remap/redo the references contained by the old object to point to the
new object.
[1] http://freebsd.1045724.x6.nabble.com/Inspect-pages-created-after-a-vm-object-is-marked-as-copy-on-write-td6266552.html
Thank you,
Elena