By Josh Bartolomucci | May 02, 2017
By Josh Bartolomucci | December 30, 2016
If you've been watching our Twitter and status page you'll see that we've had major server issues over the last two days. As a FoxyCart store, you deserve a full and candid explanation of the downtime, we want to tell you what measures we've taken to ensure that your store stays up and running.
On the afternoon of Wednesday, 3 October 2012, our monitoring systems notified us that our primary application server in our Dallas datacenter went offline. After confirming that the server was not responding, we called our host company's support, and learned that an unexplained virtual disk configuration change had caused the issue, and that they were scrambling to repair it.
At this point, our DNS failover had already taken over (which happens automatically as soon as a major issue is detected), so any customers would have seen an "Our eCommerce functionality is currently down for maintenance" message. FoxyCart store admins were redirected to status.foxycart.com, where we posted updates about the issue.
According to the hosting company, the server had automatically restarted after the initial error, however, due a configuration issue, the server would not successfully boot.
At this point, we were very concerned about our ability to quickly get FoxyCart up and running again on the Dallas hardware. Brett, Luke, and I called an immediate conference to discuss our options. After confirmation from the hosting company that this issue could not be resolved as quickly as we needed to, we were left with one option: fail over to our backup environment.
Given those two things, we were able to fail over to the new servers and get FoxyCart customers once again selling in less than an hour.
I am immensely proud that we were able to get our systems up and going again as quickly as we could, and without having a full and complete plan. We really came together in the heat of the moment. While we did have issues, we kept working on them and were able to quickly sort them out.
What we did right: We had a failover system in the first place. It wasn't 100%, but it was ready enough that we were able to get things moved and operational. Our hosting company brought Dallas back online about an hour after its initial failure, at which point we were already up and running. We have extensive monitoring on both our primary and failover environments which helped us make sure the new systems were every bit as good as the old.
What we did wrong: We hadn't finished getting our failover system ready before the downtime. We hate downtime, and we knew that we had a single-point of failure in our Dallas DC, but we were working on getting the backup environment ready in addition to getting other things done. What should have happened was that we'd focused solely on getting our failover system "DONE" and ready to handle a takeover.
As your ecommerce provider, we want to provide you with the best service, which means a service that stays up and handles problems quickly. While we did handle this downtime, we could have done better. To that end, we’re making these things our top infrastructure priorities:
We will let you know as we complete these items. It’s important to us that you know what’s happening “behind the scenes” so that you can know your store is in good hands.
As a result of moving to our newer Arizona servers we were able to upgrade the RAM and CPU of our application and database servers. We had already planned to do this, and seized the opportunity to beef up our servers’ specs. Where we had seen some slowdowns on the old hardware, the new systems are amazingly fast, handling all of the same traffic without breaking a sweat. I’ve never seen a more responsive FoxyCart.
Thank you for being a FoxyCart customer, we appreciate you, and we’re glad to have you on board as we grow and make these exciting changes.
IT Director, FoxyCart
The views expressed in the above post are the author's own, and may not reflect those of FoxyCart.com LLC.