Skip to content

bug: possible deployment race condition can cause pod to fail deployment #959

@mcdurdin

Description

@mcdurdin

per @tim-eves:

We think we've worked it out, and there's a hair fine race condition where two re-deploy events occur in quick succession (seconds apart):

first event: pod start up gets to app container initialisation, just as the 2nd event causes git-sync to come in and delete the file. The 1st event container completes initialisation including creating a dir mount point because it cannot see a file. This pod then terminates before the app inside runs. Mean while the 2nd pod now tries to create the file and finds a directory claiming that name

Potential solutions:

  • Have the api container queue it's response to events.
  • Rate limit event generation from GitHub (not sure if that's even possible)
  • In init-site-vendoring: Check if target file is a file and delete it if it's not, this is still vulnerable to a ToCToU race, though with an even narrower window.

Originally posted by @mcdurdin in #958 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions