Skip to content

Commit b9473bf

Browse files
committed
doc(virtio-mem): add memory hotplug documentation
This adds documentation for the new memory hotplug feature. Signed-off-by: Riccardo Mancini <mancio@amazon.com>
1 parent 2b90129 commit b9473bf

File tree

1 file changed

+331
-0
lines changed

1 file changed

+331
-0
lines changed

docs/memory-hotplug.md

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
# Memory Hotplugging with virtio-mem
2+
3+
## What is virtio-mem
4+
5+
`virtio-mem` is a para-virtualized memory device that enables dynamic memory
6+
resizing for virtual machines. Unlike traditional memory hotplug mechanisms,
7+
`virtio-mem` provides a flexible and efficient solution that works across
8+
different architectures.
9+
10+
The `virtio-mem` device manages a contiguous memory region that is divided into
11+
fixed-size blocks. The host can request the guest to plug (make available) or
12+
unplug (release) memory by changing the device's target size, and the guest
13+
driver responds by allocating or freeing memory blocks accordingly. This
14+
approach provides fine-grained control over guest memory with minimal overhead.
15+
16+
Firecracker further adds the concept of slots, which are a set of contiguous
17+
blocks (usually 128MiB) that can be fully protected from guest accesses to
18+
prevent malicious guests from accessing the hotpluggable memory range when not
19+
allowed by the host.
20+
21+
## Prerequisites
22+
23+
To support memory hotplugging via `virtio-mem`, you must use a guest kernel with
24+
the appropriate version and configuration options enabled as follows:
25+
26+
#### Kernel Version Requirements
27+
28+
- `x86_64`: minimal kernel version is 5.16
29+
- Earlier versions of the kernel don't support
30+
`VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE`
31+
- `aarch64`: minimal kernel version is 5.18
32+
33+
For more information about officially supported guest kernels, refer to the
34+
[kernel policy documentation](kernel-policy.md).
35+
36+
#### Kernel Config
37+
38+
`CONFIG_VIRTIO_MEM` needs to be enabled in the guest kernel in order to use
39+
`virtio-mem`.
40+
41+
## Adding hotpluggable memory
42+
43+
The `virtio-mem` device must be configured during VM setup with the total amount
44+
of memory that can be hotplugged, before starting the virtual machine. This can
45+
be done through a `PUT` request on `/hotplug/memory` or by including the
46+
configuration in the JSON configuration file. In both cases, when the VM is
47+
started, the hotpluggable region will be completely unplugged.
48+
49+
> [!Note] Memory configured through `/hotplug/memory` is a separate pool of
50+
> memory from the usual "boot memory". Only memory configured through the
51+
> hotplug endpoint can be plugged or unplugged dynamically.
52+
53+
### Configuration Parameters
54+
55+
- `total_size_mib` (required): The maximum size of hotpluggable memory in MiB.
56+
This defines the upper bound of memory that can be added to the VM. Must be a
57+
multiple of `slot_size_mib`.
58+
59+
- `block_size_mib` (optional, default: 2): The size of individual memory blocks
60+
in MiB. Must be at least 2 MiB and a power of 2. Larger block sizes provide
61+
better performance but less granularity (harder for the guest to unplug).
62+
63+
- `slot_size_mib` (optional, default: 128): The size of KVM memory slots in MiB.
64+
Must be at least `block_size_mib` and a power of 2. Larger slot sizes improve
65+
performance for large memory operations but reduce unplugging protection
66+
efficiency.
67+
68+
It is recommended to leave these values to the default unless strict memory
69+
protection is required, in which case `block_size_mib` should be equal to
70+
`slot_size_mib`. Note that this will make it harder for the guest kernel to find
71+
contiguous memory to hot-un-plug. Refer to the
72+
[Memory Protection](#memory-protection) section below for more details.
73+
74+
### API Configuration
75+
76+
Here is an example of how to configure the `virtio-mem` device via the API. In
77+
this example, the hotpluggable memory is configured with a maximum of 1 GiB in
78+
size and default block and slot sizes.
79+
80+
```console
81+
socket_location=/run/firecracker.socket
82+
83+
curl --unix-socket $socket_location -i \
84+
-X PUT 'http://localhost/hotplug/memory' \
85+
-H 'Accept: application/json' \
86+
-H 'Content-Type: application/json' \
87+
-d "{
88+
\"total_size_mib\": 1024,
89+
\"block_size_mib\": 2,
90+
\"slot_size_mib\": 128
91+
}"
92+
```
93+
94+
> [!Note] This is only allowed before the `InstanceStart` action and not on
95+
> snapshot-restored VMs (which will use the configuration saved in the
96+
> snapshot).
97+
98+
### JSON Configuration
99+
100+
To configure via JSON, add the following to your VM configuration file. In this
101+
example, the hotpluggable memory is configured with a maximum of 1 GiB in size
102+
and default block and slot sizes.
103+
104+
```json
105+
{
106+
"memory-hotplug": {
107+
"total_size_mib": 1024,
108+
"block_size_mib": 2,
109+
"slot_size_mib": 128
110+
}
111+
}
112+
```
113+
114+
### Checking Device Status
115+
116+
After configuration, you can query the device status at any time:
117+
118+
```console
119+
socket_location=/run/firecracker.socket
120+
121+
curl --unix-socket $socket_location -i \
122+
-X GET 'http://localhost/hotplug/memory' \
123+
-H 'Accept: application/json'
124+
```
125+
126+
This returns information about the current device state, including:
127+
128+
- `total_size_mib`: Maximum hotpluggable memory size
129+
- `block_size_mib`: Block size used by the device
130+
- `slot_size_mib`: Slot size used by Firecracker (granularity of memory
131+
protection)
132+
- `plugged_size_mib`: Currently plugged (available) memory by the guest
133+
- `requested_size_mib`: Target memory size set by the host
134+
135+
## Operating the virtio-mem device
136+
137+
Once configured and the VM is running, you can dynamically adjust the amount of
138+
memory available to the guest by updating the requested size, which is the
139+
target that the guest should reach by requesting to plug or unplug memory
140+
blocks. The initial value of the requested size is 0 MiB, meaning that no
141+
hotpluggable memory blocks are plugged on VM boot.
142+
143+
### Hotplugging Memory
144+
145+
To add memory to a running VM, request a greater size from the `virtio-mem`
146+
device:
147+
148+
```console
149+
socket_location=/run/firecracker.socket
150+
151+
curl --unix-socket $socket_location -i \
152+
-X PATCH 'http://localhost/hotplug/memory' \
153+
-H 'Accept: application/json' \
154+
-H 'Content-Type: application/json' \
155+
-d "{
156+
\"requested_size_mib\": 512
157+
}"
158+
```
159+
160+
Setting a higher `requested_size_mib` value causes the guest driver to allocate
161+
memory blocks to reach the requested size. The process is asynchronous -- the
162+
guest will incrementally plug memory until it reaches the target. It is
163+
recommended to use the `GET` API to monitor the current state of the hotplugging
164+
by the driver. The operation is complete when `plugged_memory_mib` is equal to
165+
`requested_memory_mib`.
166+
167+
### Hot-removing Memory
168+
169+
To remove memory from a running VM, request a lower size:
170+
171+
```console
172+
socket_location=/run/firecracker.socket
173+
174+
curl --unix-socket $socket_location -i \
175+
-X PATCH 'http://localhost/hotplug/memory' \
176+
-H 'Accept: application/json' \
177+
-H 'Content-Type: application/json' \
178+
-d "{
179+
\"requested_size_mib\": 256
180+
}"
181+
```
182+
183+
Setting a lower `requested_size_mib` value causes the guest driver to free
184+
memory blocks. Once the guest reports a block to be unplugged, the unplugged
185+
memory is immediately freed from the host process. If all blocks in a memory
186+
slot are unplugged, then Firecracker will also protect the memory slot, removing
187+
access from the guest.
188+
189+
To remove all hotplugged memory, set `requested_size_mib` to 0:
190+
191+
```console
192+
curl --unix-socket $socket_location -i \
193+
-X PATCH 'http://localhost/hotplug/memory' \
194+
-H 'Accept: application/json' \
195+
-H 'Content-Type: application/json' \
196+
-d '{"requested_size_mib": 0}'
197+
```
198+
199+
> [!Note] Unplugging requires the guest to cooperate and actually be able to
200+
> find and report memory blocks that can be moved or freed by the host. As in
201+
> the hotplugging case, it is recommended to monitor the operation through the
202+
> `GET` API.
203+
204+
## Configuring the guest driver
205+
206+
The guest kernel must be configured with specific boot or runtime module
207+
parameters to ensure optimal behavior of the `virtio-mem` driver and memory
208+
hotplug module.
209+
210+
In short:
211+
212+
- pass `memhp_default_state=online_movable` if hot-removal is required and there
213+
is enough free boot memory for allocating the memory map of the hotplugged
214+
memory (64B per 4KiB page).
215+
- pass `memory_hotplug.memmap_on_memory=1 memhp_default_state=online` if
216+
hot-removal is not required and the hotpluggable memory area can be much
217+
bigger than the normal memory.
218+
219+
#### `memhp_default_state`
220+
221+
This parameter controls how newly hotplugged memory is onlined by the kernel.
222+
This parameter is required for automatically onlining new memory pages. It is
223+
recommended to set it to `online_movable` as below for reliable memory
224+
hot-removal.
225+
226+
```
227+
memhp_default_state=online_movable
228+
```
229+
230+
The `online_movable` setting ensures that:
231+
232+
- Hotplugged memory is placed in the MOVABLE zone
233+
- The kernel can migrate pages when unplugging is requested
234+
- Memory can be successfully freed back to the host
235+
236+
Other possible values (not recommended for hot-removal):
237+
238+
- `online`: Places memory automatically between NORMAL and MOVABLE zone (may
239+
prevent hot-remove)
240+
- `online_kernel`: Places memory in NORMAL zone (may prevent hot-remove)
241+
- `offline` (default): Memory requires manual onlining
242+
243+
#### `memory_hotplug.memmap_on_memory` (optional)
244+
245+
This parameter controls whether the kernel allocates memory map (`struct pages`)
246+
for hotplugged memory from the hotplugged memory itself, rather than from boot
247+
memory. Without this parameter, the kernel needs 64B for every 4KiB page in the
248+
boot memory. For example, it would need 262 MiB of free "boot" memory to hotplug
249+
16 GiB of memory. This parameter only works if the memory is not entirely
250+
hotplugged as MOVABLE.
251+
252+
```
253+
memory_hotplug.memmap_on_memory=1 memhp_default_state=online
254+
```
255+
256+
This configuration is recommended in case hot-removal is not a priority, and the
257+
hotpluggable memory area is very large.
258+
259+
#### Additional Resources
260+
261+
For more detailed and up-to-date information about memory hotplug in the Linux
262+
kernel, refer to the official kernel documentation:
263+
https://docs.kernel.org/admin-guide/mm/memory-hotplug.html
264+
265+
## Security Considerations
266+
267+
**The `virtio-mem` device is a paravirtualized device requiring cooperation from
268+
a driver in the guest.**
269+
270+
### Memory Protection
271+
272+
Firecracker provides the following guarantees about unplugged memory:
273+
274+
- **Memory that is never plugged is protected**: Memory that has never been
275+
plugged before is protected from the guest by not making it available to the
276+
guest via a KVM slot and by using `mprotect` to prevent access from device
277+
emulation. Any attempt by the guest to access unplugged memory will result in
278+
a fault and may crash the Firecracker process.
279+
- **Unplugged memory slots are protected**: Memory slots that have been
280+
unplugged are removed from KVM and `mprotect`-ed. This requires the guest to
281+
report contiguous blocks to be freed for the memory slot to be actually
282+
protected.
283+
- **Unplugged memory blocks are freed**: When a memory block is unplugged, the
284+
backing pages are freed, for example using `madvise(MADV_DONTNEED)` for anon
285+
memory, returning memory to the host at block granularity.
286+
287+
### Trust Model
288+
289+
While Firecracker enforces memory isolation at the host level, a compromised
290+
guest driver could:
291+
292+
- Fail to plug or unplug memory as requested by the device
293+
- Attempt to access unplugged memory (will result in a fault and crash of
294+
Firecracker)
295+
296+
Users should:
297+
298+
- Be prepared to handle cases where the guest doesn't cooperate with memory
299+
operations by monitoring the `GET` API.
300+
- Implement host-level memory limits and monitoring, e.g. through `cgroup`.
301+
302+
## Compatibility with Other Features
303+
304+
`virtio-mem` is compatible with all Firecracker features. Below are some
305+
specific changes in the other features when using memory hotplugging.
306+
307+
### Snapshots
308+
309+
Full and diff snapshots will include the unplugged areas as sparse "holes" in
310+
the memory snapshot file. Sparse file support is recommended to efficiently
311+
handle the memory snapshot files.
312+
313+
### Userfaultfd
314+
315+
The userfaultfd (uffd) handler[^uffd] will need to handle the entire
316+
hotpluggable memory range even if unplugged. The uffd handler may decide to
317+
unregister unplugged memory ranges (holes in the memory file). The uffd handler
318+
will also need to handle `UFFD_EVENT_REMOVE` events for hot-removed blocks,
319+
either unregistering the range or storing the information and returning an empty
320+
page on the next access.
321+
322+
### Vhost-user
323+
324+
`vhost-user`[^vhost-user] is fully supported, but Firecracker cannot guarantee
325+
protection of unplugged memory from a `vhost-user` backend. A malicious guest
326+
driver may be able to trick the backend to access unplugged memory. This is not
327+
possible in Firecracker itself as unplugged memory slots are `mprotect`-ed.
328+
329+
[^uffd]: snapshotting/handling-page-faults-on-snapshot-resume.md#userfaultfd
330+
331+
[^vhost-user]: api_requests/block-vhost-user.md

0 commit comments

Comments
 (0)