Memory hotplug support in PowerKVM

October 9, 2015

Introduction
Pre requisites
Basic hotplug operation
More options
Driving via libvirt
Debugging aids
Internal details
Future

Introduction

Memory hotplug is a technique or a feature that can be used to dynamically increase or decrease the amount of physical RAM available in the system. In order for the dynamically added memory to become available to the applications, memory hotplug should be supported appropriately at multiple layers like in the firmware and operating system. This blog post mainly looks at the emerging support for memory hotplug in KVM virtualization for PowerPC sPAPR virtual machines (pseries guests). In case of virtual machines, memory hotplug is typically used to vertically scale up or scale down the guest’s physical memory at runtime based on the requirements. This feature is expected to be useful for supporting vertical scaling of PowerPC guests in KVM Cloud environments.

In KVM virtualization, an alternative way to dynamically increase or decrease memory of the guest is to use memory ballooning. While memory ballooning requires cooperation between the guest and the host, memory hotplug is a deterministic way to grow and reduce the memory of the guest.

Pre requisites

Memory hotplug support for PowerPC sPAPR guests is now part of QEMU upstream and is expected to be available starting from QEMU-2.5 release. This implies that memory hotplug support is available for pseries machine types starting from pseries-2.5.

The memory hotplug support QEMU/KVM driver was added in 1.2.14 version of libvirt. Support in libvirt is mostly architecture neutral but some of the memory alignment requirements for PowerPC memory hotplug is being enforced from libvirt-1.2.20 which is the recommended version to exploit memory hotplug feature on PowerPC.

In addition to support in QEMU, libvirt and guest kernel (which btw has existed for a long time), some changes were done in PowerPC RAS tools also to support memory hotplug. The minimum version of these packages needed in the guest are listed below:

Package Minimum required version
powerpc-utils 1.2.26
ppc64_diag 2.6.8
librtas 1.3.9

rtas_errd daemon which is provided by ppc64_diag package needs to be running in the guest for memory hotplug to function correctly.

Basic hotplug operation

This section describes the steps to be followed by a QEMU user for memory hotplug operation.

* Start guest with those command line options required for memory hotplug.

qemu-system-ppc64 … –m 4G,slots=32,maxmem=32G

-m 4G will start the guest with initial 4G RAM size.
maxmem=32G specifies that this guest’s RAM can grow till 32G via memory hotplug operations.
slots=32 specifies the number of DIMM slots available for this guest to hotplug memory. Like in physical system, each memory hotplug operation is done by populating a DIMM slot in the guest. PowerPC supports a max of 32 DIMM slots of which only 31 are available for hotplug.

* Ensure that rtas_errd daemon is running inside the guest.

# ps aux | grep rtas
root      3685  0.7  0.0   5568  3712 ?        Ss   16:49   0:00 rtas_errd

# grep Mem /proc/meminfo
MemTotal:        4146560 kB
MemFree:         2908544 kB

* Connect to QEMU monitor from the host and issue memory hotplug commands

(qemu) object_add memory-backend-ram,id=ram0,size=1G
(qemu) device_add pc-dimm,id=dimm0,memdev=ram0
(qemu) info memory-devices
Memory device [dimm]: “dimm0”
addr: 0x100000000
slot: 0
node: 0
size: 1073741824
memdev: /objects/ram0
hotplugged: true
hotpluggable: true

Hotplugging memory from QEMU monitor is a 2 step operation. In the first step, we create a memory backend object which is memory-backend-ram (ram0) in the above example. Next pc-dimm device is added with ram0 as backing memory object.

* Check that RAM size grow in the guest

# grep Mem /proc/meminfo
MemTotal:        5195136 kB
MemFree:         3020160 kB

More options

In this section a few more options and other possibilities with memory hotplug are explored.

* NUMA guest – If the guest has NUMA topology, it is possible to do hotplug to a particular NUMA node of the guest.

qemu-system-ppc64 … -m 4G,slots=32,maxmem=32G -numa node,nodeid=0,mem=2G,cpus=0-7 -numa node,nodeid=1,mem=2G,cpus=8-15

Here the guest has 4G RAM divided between 2 NUMA nodes as can be seen by the below command in the guest.

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 2020 MB
node 0 free: 1105 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 2028 MB
node 1 free: 1674 MB
node distances:
node   0   1
0:  10  40
1:  40  10

node= can be specified explicitly with device_add command to hotplug to a given NUMA node.

(qemu) object_add memory-backend-ram,id=ram0,size=1G
(qemu) device_add pc-dimm,id=dimm0,memdev=ram0,node=1
(qemu) info memory-devices
Memory device [dimm]: “dimm0”
addr: 0x100000000
slot: 0
node: 1
size: 1073741824
memdev: /objects/ram0
hotplugged: true
hotpluggable: true

Verify the memory getting added to NUMA node 1 in the guest

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 2020 MB
node 0 free: 971 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 3052 MB
node 1 free: 2610 MB
node distances:
node   0   1
0:  10  40
1:  40  10

* Hugetlbfs backed guest

If  guest’s RAM is backed by hugetlbfs, then we could use memory-backend-file to add more memory via hotplug. Assume a guest is started with 16M hugepages like this:

qemu-system-ppc64 … -m 4G,slots=32,maxmem=32G -mem-path /dev/hugepages/hugetlbfs-16M

The hotplug is performed by using memory-backend-file object like this:

(qemu) object_add memory-backend-file,id=ram0,size=1G,mem-path=/dev/hugepages/hugetlbfs-16M
(qemu) device_add pc-dimm,id=dimm0,memdev=ram0,node=0
(qemu) info memory-devices
Memory device [dimm]: “dimm0”
addr: 0x100000000
slot: 0
node: 0
size: 1073741824
memdev: /objects/ram0
hotplugged: true
hotpluggable: true

* Migration

If a guest that has undergone memory hotplug operations needs to be migrated to another host, the memory backend objects and pc-dimm objects should be specified explicitly on the target side using -object and -device options respectively.

If following hotplug operation is done at the source,

(qemu) object_add memory-backend-ram,id=ram0,size=1G
(qemu) device_add pc-dimm,id=dimm0,memdev=ram0

then at the target host, the guest should be started with the following options:

qemu-system-ppc64 … -object memory-backend-ram,id=ram0,size=1G -device pc-dimm,id=dimm0,memdev=ram0 -incoming …

Driving via libvirt

This section describes the steps to perform memory hotplug for a guest that is managed by libvirt.

The guest XML needs to have the following bits:

<maxMemory slots=’32’ unit=’KiB’>33554432</maxMemory>
<memory unit=’KiB’>8388608</memory>
<currentMemory unit=’KiB’>4194304</currentMemory>
<cpu>
<numa>
<cell id=’0′ cpus=’0-127′ memory=’8388608′ unit=’KiB’/>
</numa>
</cpu>

This describes a single NUMA node guest with 4G memory, 32 slots and hotpluggable memory upto 32G.

Hotplug is done by using virsh.

# cat mem-2g.xml
<memory model=’dimm’>
<target>
<size unit=’KiB’>2097152</size>
<node>0</node>
</target>
</memory>

# virsh attach-device <domain> mem-2g.xml
Device attached successfully

More information about other memory hotplug related options supported by libvirt are present here.

Debugging aids

Here are some nice-to-know details about memory hotplug that could come in handy when facing problems.

* Minimum hotplug granularity – The minimum DIMM size that can be hotplugged into sPAPR PowerPC guest is 256MB.
* Memory alignment – With the introduction of memory hotplug support, memory alignment requirements for pseries guests have become stricter. Now the initial RAM size, maxmem size and memory size of individual NUMA nodes must be aligned to 256MB failing which QEMU will refuse to start the guest. Also the DIMM/memory size that gets hotplugged is required to be aligned to 256MB.
* Hotplugging to memory-less NUMA node is not allowed.
* After memory hotplug support, pseries guests with maxmem beyond 1TB might not work. This is due to the limited buffer size that gets passed on from SLOF (guest firmware) to QEMU during ibm,client-architecture-support call that gets issued by the guest early during the boot.
* sPAPR PowerPC guests need a data structure called HTAB (hash table) that stores the virtual to physical page mappings for the guest. HTAB for guest is allocated by the host in host contiguous memory area (CMA) which is a limited resource (by default 5% of host RAM is CMA region). All guests running in the host get their HTAB allocated from this CMA region. HTAB size depends on the maxmem size and specifying huge values of maxmem for guest could result in failures like below:

qemu-system-ppc64 … -m 4G,slots=32,maxmem=1T
qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem
Aborted

In such cases lowering the maxmem is recommended.

* Typically it is expected that rtas_errd daemon is running in the guest before any memory hotplug operation is attempted. If rtas_errd isn’t running, the memory hotplug operation is reported as success at QEMU monitor or by virsh. However the added memory doesn’t get reflected in the guest. Starting rtas_errd would make all those previously added memory to appear in the guest. Also a reboot of the guest would result in such memory to appear in the guest after the reboot.
* libvirt managed guests need to be NUMA aware (at least 1 NUMA node should be defined in the XML) for supporting memory hotplug. This limitation is likely to be relaxed soon.

Internal details

TODO

Future

TODO

* libvirt NUMA node relaxation
* In-kernel hotplug
* Memory hot removal or unplug.


Trek to Devkara falls

October 2, 2015

Devkara is a small village present in the border area of Yellapura and Karwar. Now it is mostly an abandoned village thanks to relocation after Kadra and Kodasalli dam projects.  But this village hides a natural treasure, a spectacular waterfalls which the locals call Devkara Vajra. Devkara stream falling from approximately 200-300m height near Devkara village forms this waterfalls. Devkara stream  eventually joins the Kali river. This falls can be approached from the Kadra side as well as from Yellapura side. Here is my story of multiple attempts to reach this waterfalls from Yellapura side.

1st attempt

In May 2014, my brother-in-law and I rode a bike from Sonda, Sirsi, traveled to Yellapura and then to ಈರಾಪುರ village. We hardly had any information about the falls then and unfortunately we couldn’t reach anywhere near the waterfalls. All we could get was this distant view of the Kodasalli back waters.

Kodasalli backwaters

Kodasalli backwaters

However we did establish a local contact who agreed to take us to the falls next time.

2nd attempt

I was at Sonda, Sirsi in the first week of Oct 2014 and took that opportunity to revisit Devkara falls. This time we reached the local contact’s place at  ಈರಾಪುರ village and started trekking in the forest route at 10AM. Along with the guide, our local contact was accompanying us with his school going son. The guide took us on a circuitous route and we first reached very close to Kodasalli reservoir at 11.30AM.

Kodasalli dam

Kodasalli dam

We were walking on a mountain range overlooking a valley in which Kali river was flowing. On the opposite side of the river was another range where the waterfalls was present. After walking through the forest for an hour, we finally emerged out on top of the mountain range at 12.30PM. This place was called ಹಬ್ಬು ಕೋಟೆ/ಕಟ್ಟೆ and it provided good view of the distant waterfalls. From that far off distance the falls looked so big and we wondered how gigantic it would it appear from the base. Unfortunately we hadn’t planned for a day long trek and we were just carrying a few raw cucumbers and butter milk which we completely finished at ಹಬ್ಬು ಕೋಟೆ.

Distant view of Devkara falls

Distant view of Devkara falls

Since reaching the base of the falls from here was out of question, our guide offered us to take us around a bit and show us a few places of interest. Thus we proceeded ahead on the same mountain range and reached a place called ದೇವಿಮನೆ. This is some sacred place in the hills where the villagers would come and offer prayers once in a year in November. From this place we did venture ahead a bit to get a clear view of Kadra reservoir. Instead of returning back via the same route, our guide suggested that we could do a full circle by getting down to Devkara village and then climb up ಬೆಂಡೆಘಟ್ಟ to reach back ಈರಾಪುರ village. We didn’t really know how much time and effort that would take, but just agreed.

After a steep descent we were at Devkara village at 3PM. The village is mostly deserted with a few houses still remaining. There is a Ramalingeshwara temple in the village where Pooja is done once a week. A priest comes from a far off distance every Monday for this purpose.

Devkara village

Devkara village

Ramalingeshwara temple, Devkara

Ramalingeshwara temple, Devkara

We were now walking beside the Kali river. A trail exists from Devakara till ಬೆಂಡೆಘಟ್ಟ, but were dead tired since we hardly had any solid food since morning. The journey seemed endless and we finally reached the foothills of ಬೆಂಡೆಘಟ್ಟ at 4.15PM. The climb up is abruptly steep and it took quite a bit of effort and time to reach the top at around 5.30PM. Our enterprising guide could find some tender coconuts in an abandoned house and that came as a big relief to us. But the relief was short lived as it started raining. By the time we reached our local contact’s house, we were completely drenched. We consumed the food that we we had planned to have for lunch here and started back on bike towards Sonda at around 6.30PM. Next 60km drive through the winding forest roads was mostly treacherous with non-stop hard rain. Our adventure finally ended when we reached home at 9PM.

3rd attempt

Though we had seen the waterfalls, that was hardly satisfying since we hadn’t been able to reach the base of the falls. So last week we made another attempt to reach the falls. This time I took Naren with me to my in-laws place. My brother-in-law found a person from Devkara village itself who had relocated to Sonda. He was ready to guide us and we thought our 3rd attempt should be a success since the guide was born and brought up in Devkara village and he should be able to guide us to the base of the falls.

4 of us started in 2 bikes at 8AM from Sonda and reached ಈರಾಪುರ village at 10AM. Thanks to two wheelers, we were able to cover some trail distance too on bike. At 10.30AM we were at the starting point of ಬೆಂಡೆಘಟ್ಟ (470m) from where we had to descend. At 11AM we reached 130m and touched a flowing stream locally called  ಈರಾಪುರ stream which eventually becomes Kali river.  We walked towards Devkara village alongside the stream and at one point, the falls becomes visible towards our left.

At 11.45AM, we reached 70m and crossed the stream which was utmost knee deep. Next we had to cross another stream that flows from Devkara falls and joins the stream that we just crossed. This stream was flowing with good speed and we had to find a suitable place to crossover. Thanks to our guide, we did find a reasonably safe place to cross the stream where the water was thigh deep at places. We were able to cross it with reasonable ease using sticks for support. We were now at the periphery of Devkara village and were walking along a few abandoned houses and paddy fields.

IMG_5794

Next it was some hide and seek with the waterfalls as it is located in such a place covered with dense forest towards its approach that it is not visible at every point on the approach path. There was no well defined path to the falls, but we had to make one by clearing the forest growth and following the general direction of the waterfalls.

Devkara falls

Devkara falls

At 1PM we reached a rocky clearance from where the falls was visible fairly clearly. Based on our last year’s experience, we weren’t taking chances with food and hence were carrying sufficient amount of Pulao, home grown cucumbers, butter milk and ಚಕ್ಕುಲಿ. We finished lunch on these rocks. We had still not reached the exact base of the falls and hence ventured into the forests a bit more to check if better view of the falls could be had. At 2PM we reached another rocky clearance from where we had a decent view of the falls. We decided to end the quest here since the path ahead to the ultimate base of the falls was difficult and it was already well past midday.

IMG_5812IMG_20150926_140739788_HDR

On the way back, it took two hours for us to reach ಬೆಂಡೆಘಟ್ಟ base and after an hour we were back at ಈರಾಪುರ village. Thus on the third attempt, we finally had satisfying views of Devkara falls! It was not just about the falls, but this also turned out to be good trek worth remembering after my previous trek in the same area.