Hey developers I’m having an issue with any webpage that involves the learning curriculum. The pages for learn, curriculum, etc. will fully load on the browser but nothing appears on the page other than the links at the top and bottom. Also, the top right corner that usually has the menu is just in a perpetual loading animation with no icons to click. This error happened right after submitting a challenge. From what I can tell all other links on the website like support, shop, opensource, etc. all seem to be working. Thanks for all that you do and hopefully this gets resolved.
Posted my own version of this but up-voting yours; having the exact same problem.
Hi @atmpepe, thanks for reporting this.
Could you give us more details about your setup? What browser are you using and what version is it? Also, are you running any ad blockers or other extensions that adjust the CORS settings of the page?
We may be having cache invalidation issues with each new release.
I will elaborate for everyone to have context because we like to be transparent with all things, at least to our incredible volunteers.
This issue is one of those seemingly innocent ones that will affect everyone on the team and the users. This class of problems comes with doing things at scale. While we can test for such things ahead of time with some chaos engineering, we are a small team with several competing priorities.
So, here is how it all works:
Each “step”, more commonly known as a challenge, is a new route/page in the client application. Think of it as a small building block of the application. Our tooling takes care of “chunking” & packaging all these bits into tiny bundles. As we rebuild and release a new version, we have to invalidate the older versions of these bundles. This has been working more or less OK. All of the above comes with the tooling we use. Although, we had to dial it in over time for the best experience between getting things out seamlessly and the user receiving updates.
So why are we seeing this problem (re-emerge)?
Users no longer need to wait a few minutes between steps/challenges as before. They need to load the next route/bit more often because each step takes just a few tens of seconds on average. We have thousands of folks working on the curriculum at any given moment. When a new version goes online, and an older version gets invalidated in the middle of someone navigating, we will hit the bug.
Throw our CDN into the mix, and the issue compounds a bit because Cloudflare still takes a good few minutes to propagate the new versions across the world (to its edge nodes).
So what do we do?
I think we would not be able to truly “fix” it quickly; the best we can do is look at complaints and ask them to wait a few minutes and reload. The thing is, this follows the “absence of error” fallacy because not every one complains.
Do we purge the cache (more expenditure, both time and fiscal) on CF every time we do a new release? Even that may not be a solution because end-user browsers/ISP are unruly and may not bother checking in with CF for the cache purge and updated content for a while (we peg it at 1 hour as of now).
Handle client updates async? This option could mean a feature where we tag every release (with SemVer) and have the client app check for that; if not, force a reload. Again this requires R&D and implementation.
So, Option 2 could be our workaround for a while. Purging the whole cache affects everyone because that means purging data for applications on the same infra as learn, like news (about 80%) of all platforms, which is not ideal in terms of the cost.
Purging the cache also means a slower site overall.
We are already doing some R&D to manage performance at scale better, and our stack will be re-written sooner or later.
Rest assured, we are doing some hard thinking about this and request your patience. I am open to thoughts and ideas, but more importantly, I wanted to share the context with our mods & contributors.
Cheers & happy coding.
I’m using Chrome Version 101.0.4951.67. The site was able to load in the afternoon yesterday but it’s having the same problem again this morning. Also not running any adblockers or anything like that.
@Sboonny Thanks for replying on the other thread. Actually, both these issues are unrelated. The API has nothing to do with caching. It’s a separate and independent service.
My bad, oh thanks for removing it
Thanks for confirming that @atmpepe.
It sounds like the issue you described is related to the outage we’re currently experiencing.
Things have gotten better within the last few hours, but we’re continuing to monitor the situation and make improvements wherever we can.
Please check out this thread for more information and updates: Announcement: Learn Outage
This has been resolved now, thanks to investigative work by @ojeytonwilliams