Intermittent freeCodeCamp forum outages

https://forum.freecodecamp.org has been mostly down for the last few hours - I have actually been seeing intermittent outages every other day or so for many weeks - today is especially bad and long

the most common error is 500 internal server error - sometimes it’s 504

I tried to get help at discourse.org and was promptly blown off

https://meta.discourse.org/t/freecodecamp-outage/69290

1 Like

Just looking at forum activity can tell you that “all day” is a bad description — there have been posts almost every hour. I didn’t have any issues these days either.

it’s pretty accurate - I think the forum has been mostly down - say 90% of the time since early morning new york time - there have been intermittent brief periods - on the order of a minute or two - when you could actually read posts and extremely short slivers - on the order of a few seconds - when you could post

I think it has been constantly up only since 16:45 new york

These past days there were quite a few errors posting messages or simply trying to view posts for me as well. Maybe it wasn’t “all day” today, but the forum did go down for about an hour or two and repeatedly stopped working every now and then.

It seems to have been stable for about two\three hours now, though.

What kind of montoring and alerting system is there for forum health?

Of course perceptions of “all day” are going to be relative to where on the planet you are and when your waking hours are.

As to the problems with the forum, there have been intermittent problems for weeks now. Quincy and the FCC team have been working with people from Discourse to resolve the problems. I know it’s frustrating, but people are working on the problem.

Here is a post Quincy made on the subject a few weeks ago.

I know it’s frustrating, but I think we just have to be patient.

2 Likes

Thank you for the information. We need a sticky on the top of the forum or something on the main page if this will be consistent. I only infrequently check back. Got really excited reading about the new challenges and 10 minutes later the forum was down. Living in BJ, I tried again with a VPN waited an hour or two and went to sleep. Wasn’t sure if it was related to the problems also happening with the beta challenge map or a more local cause with the GFW until I saw your link.

We had one for a while, but the problem appeared resolved. Now it sounds like that may not be the case.

definitely not that case

forum is giving 502 gateway errors now

I noticed a performance degradation about 1 hour ago - page load was visibly slow with progress circle showing for few seconds

a post attempt few minutes ago failed then succeeded after hanging for almost a minute

What is known about the cause so far?

@ppc Are you still experiencing these? We were discussing these with the Discourse team.

intermittently yes - I’ve seen couple 502 one 504 - a lot of slow pageloads on the order of seconds still happening

this very post took 10 seconds to go through - then this edit took 30 seconds to load

your nginx upstream is most likely crashing - sometimes it could be taking too long to process requests resulting in a 504 from nginx

are there docs on the architecture and design of the forum?

Thanks for confirming this. I will try to update to the latest version of Discourse tonight and see whether that helps.

And yes - everything’s open source. It’s built and maintained by Discourse. Here’s the repo.