-
Notifications
You must be signed in to change notification settings - Fork 13
Rework config to be more native / allow customization #135
Description
[ Comment moved from https://github.com//issues/119#issuecomment-2902226353 ]
Use cases
Reasons that a user might want to customize the models:
-
Using a different Granite autocomplete model. Right now, there are multiple reasonable options:
granite-3.3-8b-base: this is the best Granite model for autocomplete. If someone has a fast GPU, they probably want to use this.
granite-3.3-2b-base: worse, but faster. Best model for mid-end Macs
granite-3.3-8b-instruct: Quality is similar to 2b-base, speed similar to 8b-base. If you have a GPU with limited memory and only have room for one Granite, this can make sense.In the future, it's also possible there will be multiple reasonable choices for chat models.
-
Using Ollama on a different port
-
Using hosted models. If you have access to an instance of VLLM running granite3.3:8b, use that instead of Ollama
-
Using old models- I don't care about this one. The Granite models have a track record for improving over time, and I don't want users torturing themselves trying to figure out whethergranite3.2:8bis better thangranite3.3:2b- because model outputs are inherently random (even at temperature 0), you just can't tell based on a small number of prompts. If someone really wants to investigate, they can always configure the models themselves. (See below) -
Using third party models. Not our emphasis, but users (or at least Granite.Code developers...) will want to compare.
Proposed way it looks
A basic principle should be that selecting models and customizing models feels like an extension of the upstream UI rather than something alien to it.
To change your autocomplete model, you edit ~/.granite-code/models/autocomplete.yaml to change
name: Granite.Code chat model
version: 1.0.0
schema: v1
models:
- - uses: granite.code/autocomplete@default
+ - uses: granite.code/granite-3.3:8b-base
To use a different Ollama port, you edit ~/.granite-code/models/{autocomplete,chat,embed}.yaml and add:
name: Granite.Code chat model
version: 1.0.0
schema: v1
models:
- uses: granite.code/autocomplete@default
- override:
apiBase: ollama.local:11434
(OR we add a setting for this, OR we simply honor the OLLAMA_HOST environment variable - but this would be the general mechanism for overrides)
To use a hosted model, you replace ~/.granite-code/models/{autocomplete,chat,embed}.yaml with your own content.
To stop using a hosted model, you delete those files, and they will be recreated with the default content.
Notes:
- This requires a pretty simple code change to provide our own RegistryClient when unrolling the yaml file
- An alternative would be to
uses: ./default-models/granite-autocomplete.yamlwhich avoids the code change and lets people actually open that file and see what is in there. Might be better. - When you are using a hosted model, you want to completely replace not override: to repoint, otherwise changes to the default model will cause user's configurations to break.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status