Warmup script to run warmup across LLM-based preprocessors after they are healthy #1075

shahdyousefak · 2025-06-07T23:20:56Z

Please ensure you've followed the checklist and provide all the required information before requesting a review.
If you do not have everything applicable to your PR, it will not be reviewed!
If you don't know what something is or if it applies to you, ask!

Don't delete below this line.

Created a warmup script that is called at the end of imageup to reduce first-request latency for GPU services.
The warmup script sends a simple curl request to each service’s /warmup endpoint. The relevant services whose models are preloaded into GPU memory are specified in a config file named warmup.env.
Added /warmup endpoints in all GPU-using services
Each warmup endpoint performs a dummy inference, with PII logging added to capture error traces for debugging.

Below is the script in action :-

shahdy@unicorn:~/IMAGE-server/scripts$ ./warmup
[Warmup] Sun Jun 8 09:03:33 PM EDT 2025 Starting warmup...
[Warmup] Waiting for image-server-semantic-segmentation-1 to be healthy...
[Warmup] image-server-semantic-segmentation-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-semantic-segmentation-1...
[Warmup] image-server-semantic-segmentation-1 warmed successfully.
[Warmup] Waiting for image-server-espnet-tts-fr-1 to be healthy...
[Warmup] image-server-espnet-tts-fr-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-espnet-tts-fr-1...
[Warmup] image-server-espnet-tts-fr-1 warmed successfully.
[Warmup] Waiting for image-server-depth-map-generator-1 to be healthy...
[Warmup] image-server-depth-map-generator-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-depth-map-generator-1...
[Warmup] image-server-depth-map-generator-1 warmed successfully.
[Warmup] Waiting for image-server-object-detection-1 to be healthy...
[Warmup] image-server-object-detection-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-object-detection-1...
[Warmup] image-server-object-detection-1 warmed successfully.
[Warmup] Waiting for image-server-multistage-diagram-segmentation-1 to be healthy...
[Warmup] image-server-multistage-diagram-segmentation-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Waiting for image-server-content-categoriser-1 to be healthy...
[Warmup] image-server-content-categoriser-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-content-categoriser-1...
[Warmup] image-server-content-categoriser-1 warmed successfully.
[Warmup] Waiting for image-server-text-followup-1 to be healthy...
[Warmup] image-server-text-followup-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-text-followup-1...
[Warmup] image-server-text-followup-1 warmed successfully.
[Warmup] Waiting for image-server-espnet-tts-1 to be healthy...
[Warmup] image-server-espnet-tts-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-espnet-tts-1...
[Warmup] image-server-espnet-tts-1 warmed successfully.
[Warmup] Waiting for image-server-graphic-caption-1 to be healthy...
[Warmup] image-server-graphic-caption-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-graphic-caption-1...
[Warmup] image-server-graphic-caption-1 warmed successfully.
[Warmup] Completed at Sun Jun 8 09:05:11 PM EDT 2025!

logs from container (e.g., content-categoriser)

INFO:root:[WARMUP] Warmup endpoint triggered.
PII:root:[WARMUP] Posting to https://ollama.pegasus.cim.mcgill.ca/ollama/api/chat with model llama3.2-vision:latest
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ollama.pegasus.cim.mcgill.ca:443
DEBUG:urllib3.connectionpool:https://ollama.pegasus.cim.mcgill.ca:443 "POST /ollama/api/chat HTTP/1.1" 200 849

Required Information

I referenced the issue addressed in this PR.
I described the changes made and how these address the issue. --> resolves LLM model not loaded in memory until first request for content-categoriser (and others?) causes timeouts and slow initial queries #890
I described how I tested these changes. --> via docker compose override on unicorn, see above for script

Coding/Commit Requirements

I followed applicable coding standards where appropriate (e.g., PEP8)
I have not committed any models or other large files.

New Component Checklist (mandatory for new microservices)

I added an entry to docker-compose.yml and build.yml.
I created A CI workflow under .github/workflows.
I have created a README.md file that describes what the component does and what it depends on (other microservices, ML models, etc.).

OR

I have not added a new component in this PR.

shahdyousefak · 2025-06-09T01:13:14Z

Pending:
preprocessors/multistage-diagram-segmentation (need to check Referrer-Policy)
services/multilang-support (need to check with Venissa)

jeffbl

Looks good, but would like to understand a bit better for merge. Hopefully we'll get a chance to touch base today.

scripts/warmup

…nto imageup

…n/multistage-diagram-segmentation.py with pii

…dified the warmup script to read accorrdingly

shahdyousefak · 2025-06-17T03:01:37Z

Update: preprocessors/multistage-diagram-segmentation (need to check Referrer-Policy) -- resolved

[2025-06-17 02:58:40 +0000] [8] [DEBUG] GET /warmup
INFO:root:Warming up Gemini and SAM...
INFO:google_genai.models:AFC is enabled with max remote calls: 10.
DEBUG:httpcore.connection:connect_tcp.started host='generativelanguage.googleapis.com' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7515b8d790d0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x7515e76335c0> server_hostname='generativelanguage.googleapis.com' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7515e750dd90>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json; charset=UTF-8'), (b'Vary', b'Origin'), (b'Vary', b'X-Origin'), (b'Vary', b'Referer'), (b'Content-Encoding', b'gzip'), (b'Date', b'Tue, 17 Jun 2025 02:58:45 GMT'), (b'Server', b'scaffolding on HTTPServer2'), (b'X-XSS-Protection', b'0'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'X-Content-Type-Options', b'nosniff'), (b'Server-Timing', b'gfet4t7; dur=5104'), (b'Alt-Svc', b'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'), (b'Transfer-Encoding', b'chunked')])
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-06-05:generateContent "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
INFO:google_genai.models:AFC remote call 1 is done.
DEBUG:root:Gemini response validation successful.

0: 1024x1024 1 0, 5190.3ms
Speed: 12.5ms preprocess, 5190.3ms inference, 2.4ms postprocess per image at shape (1, 3, 1024, 1024)

jeffbl

Few minor issues, no big problems. As we move toward fewer custom/local models and use ollama/cloud endpoints, having the same code repeated everywhere should probably be removed into something that just makes one request, if all are using the same model, or at least cuts down on repeated (and error-prone) boilerplate code.

preprocessors/graphic-caption/caption.py

preprocessors/content-categoriser/categoriser.py

preprocessors/graphic-caption/caption.py

preprocessors/yolo/detect.py

scripts/warmup

jeffbl assigned shahdyousefak Jun 8, 2025

shahdyousefak requested review from jaydeepsingh25 and jeffbl June 9, 2025 01:11

shahdyousefak assigned jeffbl and jaydeepsingh25 and unassigned shahdyousefak Jun 9, 2025

jeffbl reviewed Jun 9, 2025

View reviewed changes

scripts/warmup Outdated Show resolved Hide resolved

scripts/warmup Show resolved Hide resolved

shahdyousefak added 9 commits June 16, 2025 22:41

warmup script that sends dummy req via warmup endpoint , integrated i…

68574fb

…nto imageup

text2speech warmup endpoints

6e5b26c

semantic segmentation warmup endpoint

ae67f17

depth-map-gen warmup endpoint

ad73553

object-detection warmup endpoint

669697f

adding pii

d6ab4d2

added warmup endpoint to preprocessors/multistage-diagram-segmentatio…

3ed541c

…n/multistage-diagram-segmentation.py with pii

pep8

afdd708

flagging services with WARMUP_ENABLED=true to hit warmup endpoint, mo…

7c5c762

…dified the warmup script to read accorrdingly

shahdyousefak force-pushed the warmup-script branch from fd0a1f2 to 7c5c762 Compare June 17, 2025 02:45

updating warmup endpoint for yolo

05e69a4

jeffbl approved these changes Jun 17, 2025

View reviewed changes

shahdyousefak added 3 commits June 17, 2025 18:57

modifying preprocessors warmup endpoints

84fae39

removing trailing space

69dd531

Restrict to containers on the 'image' Docker network

0ac63b1

shahdyousefak merged commit ad173af into main Jun 17, 2025
18 checks passed

shahdyousefak mentioned this pull request Jun 19, 2025

multilang-support Dockerfile (upgrade PyTorch, CUDA), cleanup, warmup endpoint #1093

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Warmup script to run warmup across LLM-based preprocessors after they are healthy #1075

Warmup script to run warmup across LLM-based preprocessors after they are healthy #1075

Uh oh!

shahdyousefak commented Jun 7, 2025 •

edited

Loading

Uh oh!

shahdyousefak commented Jun 9, 2025

Uh oh!

jeffbl left a comment

Uh oh!

Uh oh!

Uh oh!

shahdyousefak commented Jun 17, 2025

Uh oh!

jeffbl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Warmup script to run warmup across LLM-based preprocessors after they are healthy #1075

Warmup script to run warmup across LLM-based preprocessors after they are healthy #1075

Uh oh!

Conversation

shahdyousefak commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Below is the script in action :-

Required Information

Coding/Commit Requirements

New Component Checklist (mandatory for new microservices)

Uh oh!

shahdyousefak commented Jun 9, 2025

Uh oh!

jeffbl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shahdyousefak commented Jun 17, 2025

Uh oh!

jeffbl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shahdyousefak commented Jun 7, 2025 •

edited

Loading