Update: all has been working again since about 09:45 GMT. The AWS status page shows all the issues as resolved, and our system isn’t experiencing any further problems either.
Original problem report below:
For the last few hours, several of the Amazon Web Services that our system depends on have been playing up. As far as we can tell the faults are intermittent, and are causing slowness and parts of our system to fail some of the time.
If you’re using the API, you may be noticing that the API is responding slower than usual, and you might also be seeing an unusual number of ServiceTemporarilyDown failures. (These are to be expected from time to time, so it is good for your code to automatically retry if it gets them, but we know that many of our customers have not yet built retries into their data fetching, so unfortunately this could cause bigger problems for those customers if e.g. scheduled tasks exit on the first ServiceTemporarilyDown failure.)
The website will be failing intermittently too.
We’re hosted across multiple availability zones which appears to be helping to reduce the impact of these problems. Our system also appears to be coping reasonably well without all the services it uses when fully operational. But ultimately we need AWS to get their system fully functional again before everything will start working again completely as it’s supposed to. The AWS status page has up-to-date information on these issues.
We’re very sorry if this is causing you problems. If you are getting errors, the best we can suggest is that you keep trying as the faults should only be affecting some requests some of the time.