Skip to content

[worker] fix starting pingAlive thread once#10428

Merged
SamuelHassine merged 6 commits intomasterfrom
oob/worker_fix_pingalive
Apr 12, 2025
Merged

[worker] fix starting pingAlive thread once#10428
SamuelHassine merged 6 commits intomasterfrom
oob/worker_fix_pingalive

Conversation

@sbocahu
Copy link
Copy Markdown
Member

@sbocahu sbocahu commented Mar 31, 2025

Proposed changes

  • Start pingAlive once in the main loop
  • Use consume() generator instead of callable + basic_consume to handle graceful stops
  • handle graceful stops for sigint (control-c) or sigterm
  • improved exception handling
  • rejection of faulty messages in AMQP listening mode (if message is not json)

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality
  • I wrote test cases for the relevant uses case (coverage and e2e)
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality

Further comments

Before this change, a worker would create a pingAlive thread per rabbitmq queue, which is useless & consumes resources.
Also, on exceptions (eg. a message which is not valid json), the consuption would fail and restart, creating a new pingAlive thread. Exceptions during ingestion would then lead to an ever increasing number of these pingAlive threads.
A comment suggested graceful stop was handled but it was incomplete/incorrect.

@sbocahu sbocahu added the filigran team use to identify PR from the Filigran team label Mar 31, 2025
@sbocahu sbocahu marked this pull request as draft March 31, 2025 14:01
@sbocahu
Copy link
Copy Markdown
Member Author

sbocahu commented Mar 31, 2025

just got an exception that looks not great. converting to draft

Comment thread opencti-worker/src/worker.py Outdated
self.consumer_threads[push_queue].start()
# Listen for webhook message
if connector["config"].get("listen_callback_uri") is not None:
if "connector_user" in connector and connector["config"].get("listen_callback_uri") is not None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not test much this part but I ran into issues with external import connectors. looking at the code, it seems to be wrong to use listen_callback_uri as a condition to create an ApiConsumer because it is always set up by the connector helper (https://github.com/OpenCTI-Platform/client-python/blob/073609f05b968455e9ecb905612bdedb523c14dd/pycti/connector/opencti_connector_helper.py#L1105)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbocahu
Copy link
Copy Markdown
Member Author

sbocahu commented Mar 31, 2025

just got an exception that looks not great. converting to draft

the exception was due to a control-C/sigterm & introduced earlier with the addition of the threadpool.
I reworked the Consumer object to handle graceful stop, which was not the case previously (despite the comment saying so).

TODO:

  • rework apiConsumer for graceful stop
  • check the condition around launching an ApiConsumer
  • handle sigkill similarly to sigint

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.44%. Comparing base (6d44674) to head (aaa8957).
Report is 11 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10428   +/-   ##
=======================================
  Coverage   65.44%   65.44%           
=======================================
  Files         672      672           
  Lines       67031    67031           
  Branches     7347     7354    +7     
=======================================
  Hits        43870    43870           
  Misses      23161    23161           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sbocahu
Copy link
Copy Markdown
Member Author

sbocahu commented Apr 1, 2025

sibbling PR in pycti: OpenCTI-Platform/client-python#867

@sbocahu sbocahu force-pushed the oob/worker_fix_pingalive branch from 1b9b0d7 to 6f9da17 Compare April 1, 2025 13:15
@sbocahu sbocahu marked this pull request as ready for review April 1, 2025 13:18
@labo-flg labo-flg added multi-repository For contribution that requires PR in several repository and removed multi-repository For contribution that requires PR in several repository labels Apr 1, 2025
@aHenryJard
Copy link
Copy Markdown
Member

@sbocahu I was about to test locally but there is a conflict with current master, I'm not sure on how to resolve it. Could you rebase and manage the conflit ?

@sbocahu sbocahu force-pushed the oob/worker_fix_pingalive branch from 80d278c to 97b3c43 Compare April 9, 2025 14:43
@sbocahu
Copy link
Copy Markdown
Member Author

sbocahu commented Apr 9, 2025

done

Copy link
Copy Markdown
Member

@aHenryJard aHenryJard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, all good to me. I used some connectors, and also stopped opencti for a while to see if the worker is correctly taking back work when opencti is up again and it's all good.

@richard-julien
Copy link
Copy Markdown
Member

Please dont merge until my review

Copy link
Copy Markdown
Member

@richard-julien richard-julien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need review and change

@richard-julien richard-julien force-pushed the oob/worker_fix_pingalive branch from 97b3c43 to 81806b2 Compare April 11, 2025 14:01
@SamuelHassine SamuelHassine force-pushed the oob/worker_fix_pingalive branch from b151979 to 70dfcb6 Compare April 12, 2025 08:12
@SamuelHassine SamuelHassine merged commit e825189 into master Apr 12, 2025
5 of 7 checks passed
@SamuelHassine SamuelHassine deleted the oob/worker_fix_pingalive branch April 12, 2025 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filigran team use to identify PR from the Filigran team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants