From metajiji at gmail.com  Mon Nov  9 17:40:01 2020
From: metajiji at gmail.com (Denis Kadyshev)
Date: Tue, 10 Nov 2020 00:40:01 +0700
Subject: [openstack-community] kolla node_libvirt and broken cgroups
Message-ID: <CALgfwkfWCeBUNaMO86bh2hkk9JaGf3puE=+BwirDBsfoZ0P+VA@mail.gmail.com>

I have an openstack ocata release deployed via kolla.

Libvirtd running inside docker container nova_ libvirt and volumes
/sys/fs/cgroup, /run privileged mode enabled.

Some guest vms cannot provide cpu-stats

Symptoms are:

> $ docker exec -ti nova_libvirt virsh cpu-stats instance-000004cb
> error: Failed to retrieve CPU statistics for domain 'instance-000004cb'
> error: Requested operation is not valid: cgroup CPUACCT controller is not
> mounted


To check cgroups looking for all related pid

> $ ps fax | grep instance-000004cb
> 8275 ? Sl 4073:40 /usr/libexec/qemu-kvm -name guest=instance-000004cb
> $ ps fax | grep 8275
> 8346 ? S 76:04 \_ [vhost-8275]
> 8367 ? S 0:00 \_ [kvm-pit/8275]
> 8275 ? Sl 4073:42 /usr/libexec/qemu-kvm


See cgroup
for qemu-kvm

> $ cat /proc/8275/cgroup
> 11:blkio:/user.slice
> 10:devices:/user.slice
>
> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
> 8:cpuacct,cpu:/user.slice
>
> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>
> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
> 4:memory:/user.slice
> 3:pids:/user.slice
>
> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope


for vhost-8275

> $ cat /proc/8346/cgroup
> 11:blkio:/user.slice
> 10:devices:/user.slice
>
> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
> 8:cpuacct,cpu:/user.slice
>
> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>
> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
> 4:memory:/user.slice
> 3:pids:/user.slice
>
> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope


for kvm-pit

> $ cat /proc/8275/cgroup
> 11:blkio:/user.slice
> 10:devices:/user.slice
> 9:hugetlb:/
> 8:cpuacct,cpu:/user.slice
> 7:perf_event:/
> 6:net_prio,net_cls:/
> 5:freezer:/
> 4:memory:/user.slice
> 3:pids:/user.slice
> 2:cpuset:/
> 1:name=systemd:/user.slice/user-0.slice/session-c4807.scope


I tried to fix the groups with a this script

> get_broken_vms() {
>     docker exec nova_libvirt bash -c 'for vm in $(virsh list --name); do
> virsh cpu-stats $vm > /dev/null 2>&1 || echo $vm; done'
> }
>
> attach_vm_to_cgroup() {
>     # Attach processes and their threads pid to correct cgroup
>     local vm_pid=$1; shift
>     local vm_cgname=$1; shift
>
>     echo Fix cgroup for pid $vm_pid in cgroup $vm_cgname
>
>     for tpid in $(find /proc/$vm_pid/task/ -maxdepth 1 -mindepth 1 -type d
> -printf '%f\n'); do
>         echo $tpid | tee
> /sys/fs/cgroup/{blkio,devices,perf_event,net_prio,net_cls,freezer,memory,pids,systemd}/machine.slice/$vm_cgname/tasks
> 1>/dev/null &
>         echo $tpid | tee
> /sys/fs/cgroup/{cpu,\cpuacct,cpuset}/machine.slice/$vm_cgname/emulator/tasks
> 1>/dev/null &
>     done
> }
>
> for vm in $(get_broken_vms); do
>     vm_pid=$(pgrep -f $vm)
>     vm_vhost_pids=$(pgrep -x vhost-$vm_pid)
>     vm_cgname=$(find /sys/fs/cgroup/systemd/machine.slice -maxdepth 1
> -mindepth 1 -type d -name "machine-qemu\\\x2d*\\\x2d${vm/-/\\\\x2d}.scope"
> -printf '%f\n')
>
>     echo Working on vm: $vm pid: $vm_pid vhost_pid: $vm_vhost_pids
> cgroup_name: $vm_cgname
>     [ -z "$vm_pid" -a -z "$vm_cgname" ] || attach_vm_to_cgroup $vm_pid
> $vm_cgname
>
>     # Fix vhost-NNNN kernel threads
>     for vpid in $vm_vhost_pids; do
>         [ -z "$vm_cgname" ] || attach_vm_to_cgroup $vpid $vm_cgname
>     done
> done


After fixing all vms successfully provided cpu-stats and other metrics, but
after some hours cgroups broke again.

Problems and symptoms:

- cgoup broken not at all VMs
- to find out what leads to this effect failed
- if restart a problem VM then as expected cgroups has been fixed but after
some hours cgroup broken again
- if cgroups has been fixed by hand cpu-stats is works, but after some
hours cgroup broken again

Now i check:
- logrotate - nothing
- cron - nothing

Add audit logs for cgrups

> auditctl -w '/sys/fs/cgroup/cpu,cpuacct/machine.slice' -p rwxa

And found only libvirtd processes write cgroups.

Any suggestions?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/community/attachments/20201110/feb8c788/attachment.html>

From khodayard at gmail.com  Tue Nov 10 09:27:20 2020
From: khodayard at gmail.com (Khodayar Doustar)
Date: Tue, 10 Nov 2020 10:27:20 +0100
Subject: [openstack-community] kolla node_libvirt and broken cgroups
In-Reply-To: <CALgfwkfWCeBUNaMO86bh2hkk9JaGf3puE=+BwirDBsfoZ0P+VA@mail.gmail.com>
References: <CALgfwkfWCeBUNaMO86bh2hkk9JaGf3puE=+BwirDBsfoZ0P+VA@mail.gmail.com>
Message-ID: <CAMNr=A-mvHD94m1tkiaEO2=uciyyMu1=9zKue3B+WVZD+m41Og@mail.gmail.com>

Hi Denis,

- Does this happen only on one server or is it a general problem among all
compute nodes?
- Have you checked the load average, disk free and response time of your
server? Sometimes these weird and intermittent problems happen when server
does not have enough disk space, process or memory resources.
- Have you tried putting this fabiolous script of yours into a cron job to
be run i.e. each 4 hours. This may seem like a funny workaround but it can
save a lot of time.

Good luck,
Khodayar

On Mon, Nov 9, 2020 at 6:41 PM Denis Kadyshev <metajiji at gmail.com> wrote:

> I have an openstack ocata release deployed via kolla.
>
> Libvirtd running inside docker container nova_ libvirt and volumes
> /sys/fs/cgroup, /run privileged mode enabled.
>
> Some guest vms cannot provide cpu-stats
>
> Symptoms are:
>
>> $ docker exec -ti nova_libvirt virsh cpu-stats instance-000004cb
>> error: Failed to retrieve CPU statistics for domain 'instance-000004cb'
>> error: Requested operation is not valid: cgroup CPUACCT controller is not
>> mounted
>
>
> To check cgroups looking for all related pid
>
>> $ ps fax | grep instance-000004cb
>> 8275 ? Sl 4073:40 /usr/libexec/qemu-kvm -name guest=instance-000004cb
>> $ ps fax | grep 8275
>> 8346 ? S 76:04 \_ [vhost-8275]
>> 8367 ? S 0:00 \_ [kvm-pit/8275]
>> 8275 ? Sl 4073:42 /usr/libexec/qemu-kvm
>
>
> See cgroup
> for qemu-kvm
>
>> $ cat /proc/8275/cgroup
>> 11:blkio:/user.slice
>> 10:devices:/user.slice
>>
>> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
>> 8:cpuacct,cpu:/user.slice
>>
>> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>
>> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>> 4:memory:/user.slice
>> 3:pids:/user.slice
>>
>> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
>> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope
>
>
> for vhost-8275
>
>> $ cat /proc/8346/cgroup
>> 11:blkio:/user.slice
>> 10:devices:/user.slice
>>
>> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
>> 8:cpuacct,cpu:/user.slice
>>
>> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>
>> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>> 4:memory:/user.slice
>> 3:pids:/user.slice
>>
>> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
>> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope
>
>
> for kvm-pit
>
>> $ cat /proc/8275/cgroup
>> 11:blkio:/user.slice
>> 10:devices:/user.slice
>> 9:hugetlb:/
>> 8:cpuacct,cpu:/user.slice
>> 7:perf_event:/
>> 6:net_prio,net_cls:/
>> 5:freezer:/
>> 4:memory:/user.slice
>> 3:pids:/user.slice
>> 2:cpuset:/
>> 1:name=systemd:/user.slice/user-0.slice/session-c4807.scope
>
>
> I tried to fix the groups with a this script
>
>> get_broken_vms() {
>>     docker exec nova_libvirt bash -c 'for vm in $(virsh list --name); do
>> virsh cpu-stats $vm > /dev/null 2>&1 || echo $vm; done'
>> }
>>
>> attach_vm_to_cgroup() {
>>     # Attach processes and their threads pid to correct cgroup
>>     local vm_pid=$1; shift
>>     local vm_cgname=$1; shift
>>
>>     echo Fix cgroup for pid $vm_pid in cgroup $vm_cgname
>>
>>     for tpid in $(find /proc/$vm_pid/task/ -maxdepth 1 -mindepth 1 -type
>> d -printf '%f\n'); do
>>         echo $tpid | tee
>> /sys/fs/cgroup/{blkio,devices,perf_event,net_prio,net_cls,freezer,memory,pids,systemd}/machine.slice/$vm_cgname/tasks
>> 1>/dev/null &
>>         echo $tpid | tee
>> /sys/fs/cgroup/{cpu,\cpuacct,cpuset}/machine.slice/$vm_cgname/emulator/tasks
>> 1>/dev/null &
>>     done
>> }
>>
>> for vm in $(get_broken_vms); do
>>     vm_pid=$(pgrep -f $vm)
>>     vm_vhost_pids=$(pgrep -x vhost-$vm_pid)
>>     vm_cgname=$(find /sys/fs/cgroup/systemd/machine.slice -maxdepth 1
>> -mindepth 1 -type d -name "machine-qemu\\\x2d*\\\x2d${vm/-/\\\\x2d}.scope"
>> -printf '%f\n')
>>
>>     echo Working on vm: $vm pid: $vm_pid vhost_pid: $vm_vhost_pids
>> cgroup_name: $vm_cgname
>>     [ -z "$vm_pid" -a -z "$vm_cgname" ] || attach_vm_to_cgroup $vm_pid
>> $vm_cgname
>>
>>     # Fix vhost-NNNN kernel threads
>>     for vpid in $vm_vhost_pids; do
>>         [ -z "$vm_cgname" ] || attach_vm_to_cgroup $vpid $vm_cgname
>>     done
>> done
>
>
> After fixing all vms successfully provided cpu-stats and other metrics,
> but after some hours cgroups broke again.
>
> Problems and symptoms:
>
> - cgoup broken not at all VMs
> - to find out what leads to this effect failed
> - if restart a problem VM then as expected cgroups has been fixed but
> after some hours cgroup broken again
> - if cgroups has been fixed by hand cpu-stats is works, but after some
> hours cgroup broken again
>
> Now i check:
> - logrotate - nothing
> - cron - nothing
>
> Add audit logs for cgrups
>
>> auditctl -w '/sys/fs/cgroup/cpu,cpuacct/machine.slice' -p rwxa
>
> And found only libvirtd processes write cgroups.
>
> Any suggestions?
> _______________________________________________
> Community mailing list
> Community at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/community
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/community/attachments/20201110/94f7e37e/attachment-0001.html>

From metajiji at gmail.com  Wed Nov 11 03:01:15 2020
From: metajiji at gmail.com (Denis Kadyshev)
Date: Wed, 11 Nov 2020 10:01:15 +0700
Subject: [openstack-community] kolla node_libvirt and broken cgroups
In-Reply-To: <CAMNr=A-mvHD94m1tkiaEO2=uciyyMu1=9zKue3B+WVZD+m41Og@mail.gmail.com>
References: <CALgfwkfWCeBUNaMO86bh2hkk9JaGf3puE=+BwirDBsfoZ0P+VA@mail.gmail.com>
 <CAMNr=A-mvHD94m1tkiaEO2=uciyyMu1=9zKue3B+WVZD+m41Og@mail.gmail.com>
Message-ID: <CALgfwkcQcMGMCWzEXKgPz10Tcj569EiBDWbWh+L01Cej_3iFUg@mail.gmail.com>

- This is a general problem and i found it in different clusters (we use
ocata release everywhere)
- Yes i checked this point, load average and other resources is ok
- Lol no :))) i need really fix this problem

By the way i found similar bugfix in libvirt
https://libvirt.org/news.html#v5-4-0-2019-06-03

> Setting the scheduler for QEMU's main thread before QEMU had a chance to
> start up other threads was misleading as it would affect other threads
> (vCPU and I/O) as well. In some particular situations this could also lead
> to an error when the thread for vCPU #0 was being moved to its cpu,cpuacct
> cgroup. This was fixed so that the scheduler for the main thread is set
> after QEMU starts.

I checked openstack releases starting from ocata and up and found ussuri
use centos 8 and libvirt 6.0.0, all other releases use centos 7 and libvirt
4.5.0.
I plan to try to update the container with libvirt and see if the problem
is fixed.

вт, 10 нояб. 2020 г. в 16:27, Khodayar Doustar <khodayard at gmail.com>:

> Hi Denis,
>
> - Does this happen only on one server or is it a general problem among all
> compute nodes?
> - Have you checked the load average, disk free and response time of your
> server? Sometimes these weird and intermittent problems happen when server
> does not have enough disk space, process or memory resources.
> - Have you tried putting this fabiolous script of yours into a cron job to
> be run i.e. each 4 hours. This may seem like a funny workaround but it can
> save a lot of time.
>
> Good luck,
> Khodayar
>
> On Mon, Nov 9, 2020 at 6:41 PM Denis Kadyshev <metajiji at gmail.com> wrote:
>
>> I have an openstack ocata release deployed via kolla.
>>
>> Libvirtd running inside docker container nova_ libvirt and volumes
>> /sys/fs/cgroup, /run privileged mode enabled.
>>
>> Some guest vms cannot provide cpu-stats
>>
>> Symptoms are:
>>
>>> $ docker exec -ti nova_libvirt virsh cpu-stats instance-000004cb
>>> error: Failed to retrieve CPU statistics for domain 'instance-000004cb'
>>> error: Requested operation is not valid: cgroup CPUACCT controller is
>>> not mounted
>>
>>
>> To check cgroups looking for all related pid
>>
>>> $ ps fax | grep instance-000004cb
>>> 8275 ? Sl 4073:40 /usr/libexec/qemu-kvm -name guest=instance-000004cb
>>> $ ps fax | grep 8275
>>> 8346 ? S 76:04 \_ [vhost-8275]
>>> 8367 ? S 0:00 \_ [kvm-pit/8275]
>>> 8275 ? Sl 4073:42 /usr/libexec/qemu-kvm
>>
>>
>> See cgroup
>> for qemu-kvm
>>
>>> $ cat /proc/8275/cgroup
>>> 11:blkio:/user.slice
>>> 10:devices:/user.slice
>>>
>>> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
>>> 8:cpuacct,cpu:/user.slice
>>>
>>> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>>
>>> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>> 4:memory:/user.slice
>>> 3:pids:/user.slice
>>>
>>> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
>>> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope
>>
>>
>> for vhost-8275
>>
>>> $ cat /proc/8346/cgroup
>>> 11:blkio:/user.slice
>>> 10:devices:/user.slice
>>>
>>> 9:hugetlb:/docker/e5bef89178c1c3ae34fd2b4a9b86b299a6145c0b9f608a06e83f6f4ca4d897bd
>>> 8:cpuacct,cpu:/user.slice
>>>
>>> 7:perf_event:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>>
>>> 6:net_prio,net_cls:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>> 5:freezer:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope
>>> 4:memory:/user.slice
>>> 3:pids:/user.slice
>>>
>>> 2:cpuset:/machine.slice/machine-qemu\x2d25\x2dinstance\x2d000004cb.scope/emulator
>>> 1:name=systemd:/user.slice/user-0.slice/session-c1068.scope
>>
>>
>> for kvm-pit
>>
>>> $ cat /proc/8275/cgroup
>>> 11:blkio:/user.slice
>>> 10:devices:/user.slice
>>> 9:hugetlb:/
>>> 8:cpuacct,cpu:/user.slice
>>> 7:perf_event:/
>>> 6:net_prio,net_cls:/
>>> 5:freezer:/
>>> 4:memory:/user.slice
>>> 3:pids:/user.slice
>>> 2:cpuset:/
>>> 1:name=systemd:/user.slice/user-0.slice/session-c4807.scope
>>
>>
>> I tried to fix the groups with a this script
>>
>>> get_broken_vms() {
>>>     docker exec nova_libvirt bash -c 'for vm in $(virsh list --name); do
>>> virsh cpu-stats $vm > /dev/null 2>&1 || echo $vm; done'
>>> }
>>>
>>> attach_vm_to_cgroup() {
>>>     # Attach processes and their threads pid to correct cgroup
>>>     local vm_pid=$1; shift
>>>     local vm_cgname=$1; shift
>>>
>>>     echo Fix cgroup for pid $vm_pid in cgroup $vm_cgname
>>>
>>>     for tpid in $(find /proc/$vm_pid/task/ -maxdepth 1 -mindepth 1 -type
>>> d -printf '%f\n'); do
>>>         echo $tpid | tee
>>> /sys/fs/cgroup/{blkio,devices,perf_event,net_prio,net_cls,freezer,memory,pids,systemd}/machine.slice/$vm_cgname/tasks
>>> 1>/dev/null &
>>>         echo $tpid | tee
>>> /sys/fs/cgroup/{cpu,\cpuacct,cpuset}/machine.slice/$vm_cgname/emulator/tasks
>>> 1>/dev/null &
>>>     done
>>> }
>>>
>>> for vm in $(get_broken_vms); do
>>>     vm_pid=$(pgrep -f $vm)
>>>     vm_vhost_pids=$(pgrep -x vhost-$vm_pid)
>>>     vm_cgname=$(find /sys/fs/cgroup/systemd/machine.slice -maxdepth 1
>>> -mindepth 1 -type d -name "machine-qemu\\\x2d*\\\x2d${vm/-/\\\\x2d}.scope"
>>> -printf '%f\n')
>>>
>>>     echo Working on vm: $vm pid: $vm_pid vhost_pid: $vm_vhost_pids
>>> cgroup_name: $vm_cgname
>>>     [ -z "$vm_pid" -a -z "$vm_cgname" ] || attach_vm_to_cgroup $vm_pid
>>> $vm_cgname
>>>
>>>     # Fix vhost-NNNN kernel threads
>>>     for vpid in $vm_vhost_pids; do
>>>         [ -z "$vm_cgname" ] || attach_vm_to_cgroup $vpid $vm_cgname
>>>     done
>>> done
>>
>>
>> After fixing all vms successfully provided cpu-stats and other metrics,
>> but after some hours cgroups broke again.
>>
>> Problems and symptoms:
>>
>> - cgoup broken not at all VMs
>> - to find out what leads to this effect failed
>> - if restart a problem VM then as expected cgroups has been fixed but
>> after some hours cgroup broken again
>> - if cgroups has been fixed by hand cpu-stats is works, but after some
>> hours cgroup broken again
>>
>> Now i check:
>> - logrotate - nothing
>> - cron - nothing
>>
>> Add audit logs for cgrups
>>
>>> auditctl -w '/sys/fs/cgroup/cpu,cpuacct/machine.slice' -p rwxa
>>
>> And found only libvirtd processes write cgroups.
>>
>> Any suggestions?
>> _______________________________________________
>> Community mailing list
>> Community at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/community
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/community/attachments/20201111/a63a090c/attachment.html>