Key takeaways from the post-mortem meeting are:
We will start implementing changes ASAP.
Timeframe [CEST]:
Summary Users were not able to log in. Those logged in were unable to do anything in the app because the connection between the interface and servers was blocked. We made a mistake during the optimization of the deployment process and removed part of the workflow needed to properly authorize users.
Details
The problem that we created yesterday was impacting the authentication and authorization part of the application. The users were not able to log in or access any internal endpoints of the system. All the secured communication between the interface and servers was down because we were unable to authorize users to access the API (even if they were already logged in).
Fallback
The part that communicates with the telecommunication network was working well and that's why the fallback workstation path was not activated. We will take a closer look at how can we activate the fallback mechanism in such cases in the future.
We will analyze the details during the post-mortem review next week and we will keep you updated on the plan how can we improve to prevent this from happening in the future.