Delivery 1.0 Help

K8S cluster migration - problems

Planning

  • Lack of knowledge of the dependencies between payment areas and cluster elements

  • Lack of preparation of a specific/detailed plan for the redeployment by the development team

  • Lack of comprehensive verification of the operation of systems and applications after the repiping by the development team - e.g., a week after the migration, we find out that data from some NFS were to be moved despite the findings l that such information should be in the basic action plan

Network traffic

  • Problems related to traffic interleaving - spaghetti, a huge number of entries on HAProxy scattered over various load balancers, which often haven't been updated for a long time and don't have renewed puppet certificates

  • Last-minute additions to HAProxy - an example of this is that the front-end was left on the old cluster, and the entire domain context was moved to the new cluster - this was also due to a lack of planning

  • The need to perform rollbacks - many things go wrong during the surge itself, when something stops working

Blockers

  • Implementation roadblocks - constant rescheduling due to business / customers

  • Internal blockades from other teams, e.g. OPS, SRE

Technological aspect

  • Additional work related to issuing SVCs, which serve as proxies and allow elements to communicate between clusters - it often comes out at the last moment, however, that system X needs to communicate with system Y

  • Additional work associated with aligning changes on the branch from the new cluster - commits are pushed only to the old cluster despite the arrangements

  • Requirement to keep all data and messages on AMQ queues

  • FluxCD often refuses to serve on old calster which prolongs work

  • Time-consuming traffic flow due to the large number of load balancers

  • PSD2 payment channels shut down - kills not previously issued

  • Variables and configurations sewn in strange/non-standard places or hardcoded

Last modified: 30 May 2024