Skip to content

Conversation

@shahdyousefak
Copy link
Contributor

@shahdyousefak shahdyousefak commented Jun 7, 2025

Please ensure you've followed the checklist and provide all the required information before requesting a review.
If you do not have everything applicable to your PR, it will not be reviewed!
If you don't know what something is or if it applies to you, ask!

Don't delete below this line.


  • Created a warmup script that is called at the end of imageup to reduce first-request latency for GPU services.
  • The warmup script sends a simple curl request to each service’s /warmup endpoint. The relevant services whose models are preloaded into GPU memory are specified in a config file named warmup.env.
  • Added /warmup endpoints in all GPU-using services
  • Each warmup endpoint performs a dummy inference, with PII logging added to capture error traces for debugging.

Below is the script in action :-

shahdy@unicorn:~/IMAGE-server/scripts$ ./warmup
[Warmup] Sun Jun 8 09:03:33 PM EDT 2025 Starting warmup...
[Warmup] Waiting for image-server-semantic-segmentation-1 to be healthy...
[Warmup] image-server-semantic-segmentation-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-semantic-segmentation-1...
[Warmup] image-server-semantic-segmentation-1 warmed successfully.
[Warmup] Waiting for image-server-espnet-tts-fr-1 to be healthy...
[Warmup] image-server-espnet-tts-fr-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-espnet-tts-fr-1...
[Warmup] image-server-espnet-tts-fr-1 warmed successfully.
[Warmup] Waiting for image-server-depth-map-generator-1 to be healthy...
[Warmup] image-server-depth-map-generator-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-depth-map-generator-1...
[Warmup] image-server-depth-map-generator-1 warmed successfully.
[Warmup] Waiting for image-server-object-detection-1 to be healthy...
[Warmup] image-server-object-detection-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-object-detection-1...
[Warmup] image-server-object-detection-1 warmed successfully.
[Warmup] Waiting for image-server-multistage-diagram-segmentation-1 to be healthy...
[Warmup] image-server-multistage-diagram-segmentation-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Waiting for image-server-content-categoriser-1 to be healthy...
[Warmup] image-server-content-categoriser-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-content-categoriser-1...
[Warmup] image-server-content-categoriser-1 warmed successfully.
[Warmup] Waiting for image-server-text-followup-1 to be healthy...
[Warmup] image-server-text-followup-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-text-followup-1...
[Warmup] image-server-text-followup-1 warmed successfully.
[Warmup] Waiting for image-server-espnet-tts-1 to be healthy...
[Warmup] image-server-espnet-tts-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-espnet-tts-1...
[Warmup] image-server-espnet-tts-1 warmed successfully.
[Warmup] Waiting for image-server-graphic-caption-1 to be healthy...
[Warmup] image-server-graphic-caption-1 marked healthy. Waiting 10s before hitting warmup...
[Warmup] Hitting warmup endpoint on image-server-graphic-caption-1...
[Warmup] image-server-graphic-caption-1 warmed successfully.
[Warmup] Completed at Sun Jun 8 09:05:11 PM EDT 2025!

logs from container (e.g., content-categoriser)

INFO:root:[WARMUP] Warmup endpoint triggered.
PII:root:[WARMUP] Posting to https://ollama.pegasus.cim.mcgill.ca/ollama/api/chat with model llama3.2-vision:latest
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ollama.pegasus.cim.mcgill.ca:443
DEBUG:urllib3.connectionpool:https://ollama.pegasus.cim.mcgill.ca:443 "POST /ollama/api/chat HTTP/1.1" 200 849

Required Information

Coding/Commit Requirements

  • I followed applicable coding standards where appropriate (e.g., PEP8)
  • I have not committed any models or other large files.

New Component Checklist (mandatory for new microservices)

  • I added an entry to docker-compose.yml and build.yml.
  • I created A CI workflow under .github/workflows.
  • I have created a README.md file that describes what the component does and what it depends on (other microservices, ML models, etc.).

OR

  • I have not added a new component in this PR.

@shahdyousefak
Copy link
Contributor Author

Pending:
preprocessors/multistage-diagram-segmentation (need to check Referrer-Policy)
services/multilang-support (need to check with Venissa)

Copy link
Member

@jeffbl jeffbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but would like to understand a bit better for merge. Hopefully we'll get a chance to touch base today.

@shahdyousefak
Copy link
Contributor Author

Update: preprocessors/multistage-diagram-segmentation (need to check Referrer-Policy) -- resolved

[2025-06-17 02:58:40 +0000] [8] [DEBUG] GET /warmup
INFO:root:Warming up Gemini and SAM...
INFO:google_genai.models:AFC is enabled with max remote calls: 10.
DEBUG:httpcore.connection:connect_tcp.started host='generativelanguage.googleapis.com' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7515b8d790d0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x7515e76335c0> server_hostname='generativelanguage.googleapis.com' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7515e750dd90>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Content-Type', b'application/json; charset=UTF-8'), (b'Vary', b'Origin'), (b'Vary', b'X-Origin'), (b'Vary', b'Referer'), (b'Content-Encoding', b'gzip'), (b'Date', b'Tue, 17 Jun 2025 02:58:45 GMT'), (b'Server', b'scaffolding on HTTPServer2'), (b'X-XSS-Protection', b'0'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'X-Content-Type-Options', b'nosniff'), (b'Server-Timing', b'gfet4t7; dur=5104'), (b'Alt-Svc', b'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'), (b'Transfer-Encoding', b'chunked')])
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-preview-06-05:generateContent "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
INFO:google_genai.models:AFC remote call 1 is done.
DEBUG:root:Gemini response validation successful.

0: 1024x1024 1 0, 5190.3ms
Speed: 12.5ms preprocess, 5190.3ms inference, 2.4ms postprocess per image at shape (1, 3, 1024, 1024)

Copy link
Member

@jeffbl jeffbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor issues, no big problems. As we move toward fewer custom/local models and use ollama/cloud endpoints, having the same code repeated everywhere should probably be removed into something that just makes one request, if all are using the same model, or at least cuts down on repeated (and error-prone) boilerplate code.

@shahdyousefak shahdyousefak merged commit ad173af into main Jun 17, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM model not loaded in memory until first request for content-categoriser (and others?) causes timeouts and slow initial queries

4 participants