Last updated on October 21, 2015
A customer has one Domino cluster with 2 members on a virtualized environment (VMWare). it is not a big environment (1600 users) but after 3 months working well some thing happened and the system does not work well.
During a period of 4 hours (9 am to 13 pm) the CPU of one cluster member goes to 100% and the cluster does not send users to the another member.
I check everything possible (this server has 800 GB of mailboxes) and run some administrative commands. I raise a PMR and sent NSD’s.
IBM told me that the windows kernel is consuming a lot of CPU and Domino was not the cause, but this machine only run Domino.
We talk a lot about the problem an after some verification at the windows 2012 level, we saw a high disk queue lengh (between 8 and 12).
The solution: The VMWare administrator put each disk on different LUN (We have 3: OS, Data and Tranlog).
Until now the machine works well and the disk queue length is bellow 1,2