The problems were traced to a change made by Facebook in one of their systems.
The change was made to a fact that was called whenever an error checking routine found invalid data in the Facebook system. Part of the data are interpreted as bad in itself, that caused the system to try to replace the same piece of data and thus began a feedback loop.
The tie resulted in hundreds of thousands of queries per second that are sent to the database group of Facebook, overloading the system.
The result for users was an "absolute error", no longer has access to the site.
"The way to stop the cycle of feedback was very painful - we had to stop all traffic to this group database, which means off the site," wrote Robert Johnson, director of software engineering at Facebook, in a post on the site. "Once the database had been recovered and the case had been set, which gradually allowed more people back on the site."
The problem has not been completely fixed. Johnson said that Facebook had to shut down the automated system to get the website back up and running. But this system plays an integral role in protecting the site.
Facebook is now exploring new ways to manage the situation so it will get us another round of feedback.
"We apologize again for the site outage and we want you to know that performance and reliability of Facebook very seriously," he wrote.
It is the second day of Facebook fell for some users. On Wednesday, Facebook blamed a third party network provider to make the site inaccessible for some.
0 comments:
Post a Comment