Deployer does not restart unstable service

Expected Behavior

Service gets restarted if it the services it depends on are stable.

Actual Behavior

Service does not get restarted if it is unstable.

Steps to Reproduce

Scale too many instances for a service that do not fit in the environment (this also happens if any of the instances are not healthy but it is more difficult to reproduce at will).

Environment

None

Description

digitransit-deployer has logic that checks if a service or the depending services are healthy before doing a restart. However, it should only be checked if a the depending services are stable.

Activity

Show:
Joel Lappalainen
January 9, 2020, 12:43 PM

Joel Lappalainen
January 10, 2020, 11:30 AM

Noticed that raildigitraffic2gtfsrt was in a restart loop now because deployer checks the age of the oldest pod (which is from a previous deployment in this case) and decides that a restart is needed. This then causes the new pod to be replaced in the deployment. This loop continues forever. We should either change the logic for counting the age of a deployment (can be complex to only take into account pods from the latest deployment) or we can configure wait time since last deployment (this could be done by checking the timestamp from the label added in the previous deployment if one exists)

Joel Lappalainen
January 10, 2020, 2:05 PM

Fixed it by using the timestamp of the previous deployment by deployer as the first choice and if it is missing, then the oldest pod’s start time

Assignee

Unassigned

Reporter

Joel Lappalainen

More details from

Joel

Priority

Medium

Recurrence

At will

User Agent

None

URL

None

Components

Story Points

None

Labels

None
Configure