Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ with structured, maintainable, robust, and efficient AI workflows.

You can get started with a local install, or by using Colab notebooks.

### Getting Started with Local Infernece
### Getting Started with Local Inference

<img src="https://github.com/generative-computing/mellea/raw/main/docs/GetStarted_py.png" style="max-width:800px">

Expand Down Expand Up @@ -113,7 +113,7 @@ uv run --with mellea docs/examples/tutorial/example.py

### `uv`-based installation from source

Fork and clone the repositoy:
Fork and clone the repository:

```bash
git clone ssh://git@github.com/<my-username>/mellea.git && cd mellea/
Expand Down Expand Up @@ -151,7 +151,7 @@ pre-commit install

### `conda`/`mamba`-based installation from source

Fork and clone the repositoy:
Fork and clone the repository:

```bash
git clone ssh://git@github.com/<my-username>/mellea.git && cd mellea/
Expand Down
8 changes: 4 additions & 4 deletions docs/dev/mellea_library.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ We should make it possible to use mellea as a library (as opposed to a framework

In the context of LLM applications, the library vs framework distinction really boils down to how you treat the backend.

If a piece of software insists on having an exclusive handle on the backend, then that piece of software does nto compose with any other piece of software that also insists on an exclusive handle. They both want to be privileged with respect to the backend, so they cannot "play well" together. The `outlines` library is a good example of software that could've been a library but instead acts like a framework. Even `granite-io` takes on a framework-like role when it decides to actually call the backend, as opposed to operating over strings (or perhaps chat histories).
If a piece of software insists on having an exclusive handle on the backend, then that piece of software does not compose with any other piece of software that also insists on an exclusive handle. They both want to be privileged with respect to the backend, so they cannot "play well" together. The `outlines` library is a good example of software that could've been a library but instead acts like a framework. Even `granite-io` takes on a framework-like role when it decides to actually call the backend, as opposed to operating over strings (or perhaps chat histories).

Writing LLM libraries is kind of difficult. There is a very strong instinct to try to grab control of the backend. Mellea is no exception. In the "intro path", mellea definitely behaves like a framework. We hide the actual backend objects (`PretrainedModel`, `openai.Client`, etc.) from the user.

But should try to make it easy for certain parts of mellea to be used as a library. There are many ways in which we could allow mellea to compose with other librares:
But we should try to make it easy for certain parts of mellea to be used as a library. There are many ways in which we could allow mellea to compose with other libraries:

1. We could have a `m.start_session_with_shared_backend(client:openai.Client)` and similarly for local ollama models and transformers models. Everything would work mostly the same after that, except we would have to make much weaker assumptions about the state of the backend (e.g., cache and LoRAs).
2. We could strive to keep the `Formatter` logic completely separate from Backend-specific code, and the legacy model behavior should treat each Component like a standalone user message. This way people could use `mellea` components without using the `mellea` backend and context managemetn code.
3. We could trive to keep the `Cache` strategies agnostic to the rest of the code base, and figure out what their interface should be with respect to various backend sdks (and transformers in particular)
2. We could strive to keep the `Formatter` logic completely separate from Backend-specific code, and the legacy model behavior should treat each Component like a standalone user message. This way people could use `mellea` components without using the `mellea` backend and context management code.
3. We could strive to keep the `Cache` strategies agnostic to the rest of the code base, and figure out what their interface should be with respect to various backend sdks (and transformers in particular)
2 changes: 1 addition & 1 deletion docs/dev/mify.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ mify(c)

meetings_summary = m.query(c, "Summarize the last three interactions with this customer.")

email_body = ctx.instruct("Based upon the summary of notes from recent meetings, write an email body encouraging the customer to purchase three cases of self-sealing stembolts", grouning_context={"meetings_summary": meetings_summary})
email_body = ctx.instruct("Based upon the summary of notes from recent meetings, write an email body encouraging the customer to purchase three cases of self-sealing stembolts", grounding_context={"meetings_summary": meetings_summary})

email_subject = ctx.instruct("Write a subject for this sales email.", grounding_context={"email_body": email_body})

Expand Down
16 changes: 8 additions & 8 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ As the Mellea developers built this library for generative programming, we found

* **circumscribe LLM calls with requirement verifiers.** We will see variations on this principle throughout the tutorial.
* **Generative programs should use simple and composable prompting styles.** Mellea takes a middle-ground between the "framework chooses the prompt" and "client code chooses the prompt" paradigms. By keeping prompts small and self-contained, then chaining together many such prompts, we can usually get away with one of a few prompt styles. When a new prompt style is needed, that prompt should be co-designed with the software that will use the prompt. In Mellea, we encourage this by decomposing generative programs into *Components*; more on this in [Chapter 3](#chapter-3-overview-of-the-standard-library).
* **Generative models and infererence-time programs should be co-designed.** Ideally, the style and domain of prompting used at inference time should match the style and domain of prompting using in pretraining, mid-training, and/or post-training. And, similarly, models should be built with runtime components and use-patterns in mind. We will see some early examples of this in [Chapter 6](#chapter-6-tuning-requirements-and-components).
* **Generative models and inference-time programs should be co-designed.** Ideally, the style and domain of prompting used at inference time should match the style and domain of prompting using in pretraining, mid-training, and/or post-training. And, similarly, models should be built with runtime components and use-patterns in mind. We will see some early examples of this in [Chapter 6](#chapter-6-tuning-requirements-and-components).
* **Generative programs should carefully manage context.** Each Component manages context of a single call, as we see in Chapters [2](#chapter-2-getting-started-with-generative-programming-in-mellea), [3](#chapter-3-overview-of-the-standard-library), [4](#chapter-4-generative-slots), and [5](#chapter-5-mobjects). Additionally, Mellea provides some useful mechanisms for re-using context across multiple calls ([Chapter 7](#chapter-7-on-context-management)).

Although good generative programs can be written in any language and framework, getting it right is not trivial. Mellea is just one point in the design space of LLM libraries, but we think it is a good one. Our hope is that Mellea will help you write generative programs that are robust, performant, and fit-for-purpose.
Expand Down Expand Up @@ -208,7 +208,7 @@ Checks aim to avoid the "do not think about B" effect that often primes models (
to do the opposite and "think" about B.

> [!NOTE]
> LLMaJ is not presumtively robust. Whenever possible, implement requirement validation using plain old Python code. When a model is necessary, it can often be a good idea to train a **calibrated** model specifically for your validation problem. [Chapter 6](#chapter-6-tuning-requirements-and-components) explains how to use Mellea's `m tune` subcommand to train your own LoRAs for requirement checking (and for other types of Mellea components as well).
> LLMaJ is not presumptively robust. Whenever possible, implement requirement validation using plain old Python code. When a model is necessary, it can often be a good idea to train a **calibrated** model specifically for your validation problem. [Chapter 6](#chapter-6-tuning-requirements-and-components) explains how to use Mellea's `m tune` subcommand to train your own LoRAs for requirement checking (and for other types of Mellea components as well).


### Instruct - Validate - Repair
Expand Down Expand Up @@ -322,7 +322,7 @@ We have now worked up from a simple "Hello, World" example to our first generati

When LLMs work well, the software developer experiences the LLM as a sort of oracle that can handle most any input and produce a sufficiently desirable output. When LLMs do not work at all, the software developer experiences the LLM as a naive markov chain that produces junk. In both cases, the LLM is just sampling from a distribution.

The crux of generative programming is that most applications find themselves somewhere in-between these two extremes -- the LLM mostly works, enough to demo a tantilizing MVP. But failure modes are common enough and severe enough that complete automation is beyond the developer's grasp.
The crux of generative programming is that most applications find themselves somewhere in-between these two extremes -- the LLM mostly works, enough to demo a tantalizing MVP. But failure modes are common enough and severe enough that complete automation is beyond the developer's grasp.

Traditional software deals with failure modes by carefully describing what can go wrong and then providing precise error handling logic. When working with LLMs, however, this approach suffers a Sysiphean curse. There is always one more failure mode, one more special case, one more new feature request. In the next chapter, we will explore how to build generative programs that are compositional and that grow gracefully.

Expand All @@ -338,7 +338,7 @@ Components can also specify an expected output type along with a parse function

Backends are the engine that actually run the LLM. Backends consume Components, format the Component, pass the formatted input to an LLM, and return model outputs, which are then parsed back into CBlocks or Components.

During the course of an interaction with an LLM, several Components and CBlocks may be created. Logic for handling this trace of interactions is provided by a `Context` object. Some book-keeping needs to be done in order for Contexts to approporiately handle a trace of Components and CBlocks. The `MelleaSession` class, which is created by `mellea.start_session()`, does this book-keeping a simple wrapper around Contexts and Backends.
During the course of an interaction with an LLM, several Components and CBlocks may be created. Logic for handling this trace of interactions is provided by a `Context` object. Some book-keeping needs to be done in order for Contexts to appropriately handle a trace of Components and CBlocks. The `MelleaSession` class, which is created by `mellea.start_session()`, does this book-keeping a simple wrapper around Contexts and Backends.

When we call `m.instruct()`, the `MelleaSession.instruct` method creates a component called an `Instruction`. Instructions are part of the Mellea standard library.

Expand Down Expand Up @@ -411,7 +411,7 @@ Many more examples of generative slots are provided in the `docs/examples` direc

Instruct-validate-repair provides compositionality within a given module. As the examples listed above demonstrate, generative slots can do the same. But generative slots are not just about local validity; their real power comes from safe interoperability between independently designed systems.

Consider the following two independently developed libraries: a **Summarizer** library that contains a set of functions for summarizing various types of documents, and a **Decision Aides** library that aides in decision making for particular situations.
Consider the following two independently developed libraries: a **Summarizer** library that contains a set of functions for summarizing various types of documents, and a **Decision Aids** library that aids in decision making for particular situations.

```python
# file: https://github.com/generative-computing/mellea/blob/main/docs/examples/tutorial/compositionality_with_generative_slots.py#L1-L18
Expand Down Expand Up @@ -982,9 +982,9 @@ The core idea of ReACT is to alternate between reasoning ("Thought") and acting
# Pseudocode
while not done:
get the model's next thought
take an action based upon the though
choose arguments for the selection action
observe the toll output
take an action based upon the thought
choose arguments for the selected action
observe the tool output
check if a final answer can be obtained
return the final answer
```
Expand Down
2 changes: 1 addition & 1 deletion test/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@


Test files must be named as "test_*.py" so that pydocstyle ignore them
Test files must be named as "test_*.py" so that pydocstyle ignores them