AWS released detailed report on outage failure
Have you heard about the failure of the Amazon Web Service? Then let us tell you that the failure of the web service proves to be a significant loss for the company. A detailed report has been released which explained that what is the main reason behind this failure.
An employee just entered for the routine work and command to remove the servers from S3 subsystem. By mistake, they entered a great number which was not intended. The servers supported at least two other subsystems which manage the metadata and storage for the entire region. And the service went down.
AWS assures that every arrangement is made for the occasional failure. It is best to fix the employee error that will reboot the subsystem. AWS admits that it has not restarted the subsystems from years because S3 is grown considerably in the meantime. If you have ever restarted the old computer, then it is best to notice the chugging on start up. You will understand the feeling of AWS which is waiting for the system to come back.
It is good to put the safeguards and prevent this kind of error in future. Taking proper measurements needs to be the priority. It is best to bring down the own conference mid-speech.
You can read the report on AmazonWebServices