6/11/10 - cpanel74 RAID Failure

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AndrewT
    Administrator
    • Mar 2004
    • 3655

    #1

    6/11/10 - cpanel74 RAID Failure

    Earlier this afternoon the RAID controller on cpanel74 suddenly began experiencing severe problems. We've been working to recover the RAID array but have not been successful. At this time we've replaced the drives and RAID controller and are reloading the OS onto the new array. When this is finished we will need to re-install cPanel/WHM and get the server reconfigured per our specs. Once this has been completed we will begin restoring all accounts from off site system backups which were generated on June 5th. I will continue to update this forum thread as we make progress and I do sincerely apologize for the inconvenience this has caused.
  • AndrewT
    Administrator
    • Mar 2004
    • 3655

    #2
    I apologize for the late update. This process has taken significantly longer than expected. Just about everything that could go wrong, has gone wrong. All WHM users have been restored so you may now login to WHM and see which of your accounts have been restored thus far under "List Accounts". All accounts are being restored alphabetically by user name.

    Please do not create any accounts via WHM at this time. Doing so will prevent these accounts from being restored.

    Comment

    • AndrewT
      Administrator
      • Mar 2004
      • 3655

      #3
      Over approximately the past 4.5 hours, half of the accounts on the server have been restored. If the second half is anything like the first, it should take roughly another 5 hours before all accounts are restored at which point we just have to make some final changes and corrections.

      Comment

      • AndrewT
        Administrator
        • Mar 2004
        • 3655

        #4
        All accounts have been restored, dedicated IP addresses have been re-assigned to domains that had them previously (these have not changed), and SSL certificates have been re-installed. At this point we've completed the restore process along with all necessary post restore changes.

        While events like this are rare, they will always remain a possibility. Even though these may be out of our control, we certainly do have control over how we respond to them. Ultimately some parts of this process took far longer than they should have and we'll be working on ways to improve.

        In a previous announcement I noted that we were working on implementing new redundant name servers and a new backup system in the coming months. The name servers have already been deployed and in use for several weeks; an announcement concerning this is scheduled for Monday. Once all custom name servers are changed over this will help in recovery situations such as this. If everything goes as planned our new backup system will be capable of retaining weeks worth of daily backups for each server and will have a bare metal restore capability. This will allow us to restore much newer data, much quicker.

        I do apologize once again for the inconvenience this has caused. If you experience any problems please submit a ticket and we'll gladly take a look.

        Comment

        Working...