Weblog

 

(gs) Grid-Service latency issues  

Incident Tracker status:  RESOLVED  view incidents »

Backlogged email delivered

Tuesday, October 16th, 2007 at 2:16 pm

(mt) Engineers have confirmed that nearly all of the backlogged email in our queues has been delivered. New mail is delivering immediately and no other symptoms due to the latency issues are expected.

One remaining symptom that has been reported is that a few customers are receiving duplicate emails. Some messages come through with only one copy while others have five or more copies. We encourage our customers to read the following KnowledgeBase article which explains why this might be happening in the wake of this incident.

Why do I get duplicate emails?

All other services are performing as expected.  We sincerely apologize for any issues this may have caused.

Email delivering, latency gone

Tuesday, October 16th, 2007 at 10:44 am

After monitoring both clusters for the last 24 hours it appears that the configuration changes we’ve made have mitigated the latency issues.  The only remaining symptom is that not all of the backlogged email has been delivered, all other services such as FTP, MySQL and Web have been stable since Sunday night.

We will continue to keep an eye on this issue for a bit longer to make sure the latency does not return.

Email continuing to catch up

Monday, October 15th, 2007 at 5:14 pm

The new email processing configuration is working as expected, backed up email queues have continued to process throughout the day. Customers can expect small email delays as the large queues continue to deliver backlogged messages.

We have received a few reports of 403 errors on customers websites and have taken the necessary actions to prevent them from reoccurring. We apologize for any inconvenience this issue has caused and will continue to monitor the situation through the night.

Email backlog processing

Monday, October 15th, 2007 at 1:02 pm

(mt) Engineers have continued to troubleshoot these latency issues. Last night’s maintenance drastically decreased the latency affecting Web, FTP and MySQL. A few configuration changes were made this morning that have removed what we believe was the last bottleneck in email processing and delivery.

As it stands right now email processing is running better than it was prior to this incident. All of the symptoms have greatly improved and backlogged mail is being delivered. Unfortunately due to the large number of emails in the backlog it may take a day or two to deliver all of them. No configuration changes need to be made on our customers end, all messages will be delivered as soon as they are processed in the queue.

If you believe that your site is still affected by this issue we encourage you to update or open a support request in the AccountCenter. We will update this thread when we have more information.

Scheduled Maintenance completed.

Monday, October 15th, 2007 at 9:20 am

Our systems engineers have completed the scheduled maintenance with a minimal amount of downtime.  Web and FTP performance are back to normal for both (gs) Grid.Cluster.1 and 2.  This maintenance is also allowing us to look deeper into the email issues on (gs) Grid.Cluster.2, which we are still investigating.  We hope to have more updates for you very soon.

Maintenance Scheduled Sun, Oct 14

Sunday, October 14th, 2007 at 6:26 pm

Our storage vendors have provided software and configuration updates which they believe will alleviate the remaining load issues for both Cluster.1 and Cluster.2  Tonight (mt) Engineers will be making the required changes, the window for this maintenance action is:

Sunday, October 14th 10:00 PM - Monday, October 15th 1:00 AM PDT

To see when this maintenance window will occur in a different timezone please visit this link.

As this maintenance will require us to reboot both clusters, a very short period of downtime may occur for Cluster.1 and Cluster.2  It is likely that only a small portion of the maintenance window will be required, any email inbound during this time will not be rejected or lost.

We apologize for the extended nature of these latency issues and are working diligently to restore the level of service that our customers expect.

Will continue to investigate and make necessary changes

Friday, October 12th, 2007 at 8:59 pm

Although there have been performance improvements within the past few hours, our systems engineers will continue to work on all of the issues over the weekend in an effort to bring things back to normal performance levels. This is currently our company’s highest priority at this time.

We hope to post more updates throughout the weekend as progress is made.

We are communicating with some of our vendors.

Friday, October 12th, 2007 at 1:01 pm

Our systems engineers have found the possible reason for the latency issues on (gs) Grid.Cluster.1. We are currently working with our vendors to resolve these latency issues. Currently services such as FTP, web and email are still affecting both (gs) Grid.Cluster.1 and (gs) Grid.Cluster.2. We will update this thread when we have more information.

Latency persists

Friday, October 12th, 2007 at 9:56 am

Last nights maintenance significantly reduced storage related load for both Cluster.1 and Cluster.2 however we are still receiving reports of high latency causing delays in mail, web and FTP.  Our engineers have been working through the night finding and eliminating latency bottlenecks, resolving the remaining issues is our highest priority.

Maintenance Scheduled Thu, Oct 11

Thursday, October 11th, 2007 at 1:41 pm

To help alleviate load and reduce latency across all services, engineers will be performing hardware maintenance on portions of Grid.Cluster.1 The window for this maintenance action is:

Thursday, October 11th 2007 11:30 PM - Midnight PDT

To see when this maintenance window will occur in a different timezone please visit this link.

We will be reconfiguring portions of the storage system for Grid.Cluster.1. This action will help to redistribute the storage systems of both Cluster.1 and Cluster.2 to help reduce load and latency.

A very short period of downtime for both Cluster.1 and Cluster.2 may occur, customers should prepare for a brief disruption of services such as web, email, and ftp. It is likely that only a small portion of the maintenance window will actually be needed. Any email inbound to your server during the maintenance window will not be rejected or lost.

We apologize for the late notice regarding this maintenance action. Should you have any questions regarding the maintenance please open a support request inside the (mt) AccountCenter.

Thank you in advance for your understanding.