There were issues with AWS service capacity expansion that led to a backslide of Amazon Web Services last week. The management tools for Netflix, Disney+, and AWS were not available to users around the world.
– The company stated that the problems are affecting our monitoring and incident management services Statosider Wednesday night and continued:
sizing service problems
– This is due to a weak network in the US-EAST-1 region. We are working on several solutions, and we’ve seen signs of improvement. Unfortunately, we do not have an estimated correction time, the company wrote. Which caused a lot of frustration among the user base.
The challenges have caused AWS users around the world to lose access to the console and login window for their cloud services. Various programming interfaces (APIs) were also affected by the downtime.
The company put in place a temporary solution that allowed cloud administrators to log in and get an overview of their own services during nighttime hours.
chain reaction
There was a bug in the AWS intranet — which is concerned among other things with real-time monitoring, DNS, and authentication — that resulted from problems with the automated scaling service that led to a chain reaction between clients in the network to the cloud area, it says in a Status update from giant shadow web.
This caused a sudden and dramatic increase in the number of queries directed against the intranet, which in turn led to significant delays in communication between the intranet and the network to which the cloud clients were connected.
The large number of inquiries ensured that monitoring tools were being reduced to the detriment of imaging techniques. So they did not have the opportunity to identify the error when it occurred. Finally, they had to rely on log files to identify issues.
The company itself has been hardest hit by the problems, according to CNBC. Employees in Amazon warehouses in the US were unable to access the logistics software, and therefore had problems delivering packages and merchandise to customers across the country.
Reproduce
AWS wrote that the expansion services that caused everything to stop working have now been taken out of production.
The Company does not intend to reintroduce the Services until it is confirmed that they are operating properly. On the other hand, the company asserts that it should not have an impact on the performance of the shadow giant.
Amazon aims to bring related services back into production within the next two weeks.
– The actions we are in the process of implementing have made sure that we do not see these problems again, the company wrote in the update.
Nowadays, offering anything other than a 99.999 percent uptime isn’t enough to be considered serious. In practical terms, this means that data centers can have 26 seconds of downtime per month and five minutes per year.
Sorry
The Wednesday night layoff exceeded that requirement.
AWS wrote that while the issues did not directly affect cloud customers, core service issues were reflected in many.
The company also writes that while some cloud customers experienced significant issues, the services worked for others as usual.
We understand that events like this are more frustrating than they should be when information is available about what isn’t happening. We are now working to improve this information.
Finally, we would like to apologize to all customers who have been affected by the problems. We know this has affected many of our customers in a significant way, and we are now doing everything we can to learn from this event and ensure that the availability of services will be even better in the future.
“Web specialist. Lifelong zombie maven. Coffee ninja. Hipster-friendly analyst.”