[ID Server] Unstable node on one of servers for VPS customer - 14 Oct 2015

  • October 14, 2015 by Tech #1

Dear Customers,

As of today, we noticed some strange issue that brought down one of the Indonesia node VPS at around 5PM. We noticed that during the incident, the load suddenly goes up till the server is not responsive. But since the node is running xen, there is nothing that could bring the load up as individual VPS are running at their own cpu core.

We rebooted the server and all is back to normal, but we are uncertain on the cause. We have submitted server logs to the principal vendors for their investigations. Our suspect earlier could be due to the malfunction hdisk (elements grown in defect list is shown as non-zero) and will await their feedback if this could be the culprit.

The issue only affects one of our indonesia server node. It does not affect existing reseller/shared hosting customers.

If you are affected, you will receive a separate email in your inbox.


We apologize for the inconvenience caused.

Kind Regards,


SOSYS.NET

========

14 Oct 2015 - 5.47PM - Notification sent out and logs sent to principal vendor

14 Oct 2015 - 10PM - Some actions plan:

1. Upgrade to the latest kernel (completed)

2. Replace one of the disk in the raid array (done on 20 Oct - 3PM)

3. Upgrade BIOS and IPMI for machine (pending)

4. Replacement of CPU hardware component

5. Replacement of backplane

6. Build a new server and move the clients to the new server with intel SSD S3500

16 Oct 2015 - 1.05PM - Hdisk replacement scheduled for Tuesday 20th Oct 2015 - 4PM (completed)

24 Oct 2015 - 10PM - server has been monitored for 10 days and stable during peak and offpeak.

29 Oct 2015 - 6AM - The same error has occured again. At current status, we are pending the new server to be provisioned by the vendor. A reboot has been performed and all clients VPS are up

29 Oct 2015 - 9PM - Agreement reached with the vendor for new hardware with Intel S3510 SSD ( 5018D-MTF Supermicro Superserver), E3-1271v3 and 32GB Ram. This server will not be overloaded

31 Oct 2015 - 10AM - Announcement sent out to the affected customers

2 Dec 2015 - 10PM - News from vendor, that the replacement server will be available on 28th Dec 2015

9 Dec 2015 - 7PM - Migration is currently scheduled on 15th January 2016 after 8pm 

9 Jan 2016 - 1AM - Confirmation on the server is ready for delivery and mounting

15 Jan 2016 - 7PM - Server tested and configured and burn testing done

16 Jan 2016 - 1200AM - Migration started

16 Jan 2016 - 4AM - Migration completed