Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ repos:
- id: check-case-conflict
- id: check-toml
- id: check-yaml
args: [--unsafe]
- id: check-ast
- id: debug-statements
- id: check-docstring-first
Expand Down
5 changes: 4 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
"python.testing.pytestEnabled": true,
"chat.tools.terminal.autoApprove": {
"make": true
}
}
106 changes: 45 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,96 +14,80 @@

</p>

## Installation
**Guided Infilling Modeling Toolkit** — structured text generation and information extraction using language models.

Install GIMKit using pip:
Write a template with typed placeholders. The LLM fills them in. Get structured, named results back.

```bash
pip install gimkit
```
```python
from gimkit import guide as g

For vLLM support, install with the optional dependency:
query = f"""Extract from: "Hi, I'm John Smith, reach me at john@gmail.com"

```bash
pip install gimkit[vllm]
```
Name: {g.person_name(name="name")}
Email: {g.e_mail(name="email")}"""

## Quick Start
result = model(query, use_gim_prompt=True)
result.tags["name"].content # → "John Smith"
result.tags["email"].content # → "john@gmail.com"
```

Here's a simple example using the OpenAI backend:
## Installation

```python
from openai import OpenAI
from gimkit import from_openai, guide as g
```bash
pip install gimkit
```

# Initialize the client and model
client = OpenAI() # Uses OPENAI_API_KEY environment variable
model = from_openai(client, model_name="gpt-4")
For vLLM support:

# Create a query with masked tags
result = model(f"Hello, {g(desc='a single word')}!", use_gim_prompt=True)
print(result) # Output: Hello, world!
```bash
pip install gimkit[vllm]
```

## Usage

### Creating Masked Tags: Use the `guide` helper (imported as `g`) to create masked tags
## What Can You Do With GIMKit?

```python
from gimkit import guide as g
GIMKit is a **general-purpose information extraction framework**. Write a natural-language template with embedded tags, and the model extracts structured data from any text.

# Basic tag with description
tag = g(name="greeting", desc="A friendly greeting")
| Use Case | Example |
|----------|---------|
| **Contact extraction** | Parse names, emails, phones from free-form text |
| **Named entity recognition** | Extract orgs, people, locations, dates |
| **Text classification** | Categorize text, assign sentiment labels |
| **Event extraction** | Pull what/where/when/impact from event descriptions |
| **Relation extraction** | Find entities and the relationships between them |
| **Resume parsing** | Extract name, title, education, experience |
| **Review analysis** | Parse product, price, rating, pros/cons |

# Specialized tags
name_tag = g.person_name(name="user_name")
email_tag = g.e_mail(name="email")
phone_tag = g.phone_number(name="phone")
word_tag = g.single_word(name="word")
See the [Classic IE Use Cases](https://sculptai.github.io/GIMKit/use-cases/classic/), [Privacy and PII Use Cases](https://sculptai.github.io/GIMKit/use-cases/privacy-pii/), and [Other Use Cases](https://sculptai.github.io/GIMKit/use-cases/others/) pages for full examples.

# Selection from choices
choice_tag = g.select(name="color", choices=["red", "green", "blue"])
## Why GIMKit?

# Tag with regex constraint
custom_tag = g(name="code", desc="A 4-digit code", regex=r"\d{4}")
```
- **Template-driven** — describe what you want in natural language, not label lists
- **Format control** — regex constraints, enumerated choices, type-safe tags
- **Named access** — results are keyed by field name, not token positions
- **Small-model friendly** — works with compact open-source models (4B+)
- **Multiple backends** — OpenAI, vLLM (server and offline)

### Building Queries: Combine masked tags with text to build queries
## Quick Start

```python
from gimkit import from_openai, guide as g
from openai import OpenAI
from gimkit import from_openai, guide as g

client = OpenAI()
model = from_openai(client, model_name="gpt-4")

# Simple extraction
result = model(f"Hello, {g(desc='a single word')}!", use_gim_prompt=True)
print(result) # Hello, world!

# Structured form
query = f"""
Name: {g.person_name(name="name")}
Email: {g.e_mail(name="email")}
Favorite color: {g.select(name="color", choices=["red", "green", "blue"])}
"""

result = model(query, use_gim_prompt=True)
print(result)
```

### Accessing Results: Access filled tags from the result

```python
result = model(query, use_gim_prompt=True)

# Iterate over all tags
for tag in result.tags:
print(f"{tag.name}: {tag.content}")

# Access by name
print(result.tags["name"].content)

# Modify tag content
result.tags["email"].content = "REDACTED"
print(result.tags["email"].content)
print(result.tags["color"].content)
```

## Design Philosophy

- Stable over feature
- Small open-source model first
35 changes: 35 additions & 0 deletions docs/api.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# API 参考

本页由源代码中的 docstring 通过 `mkdocstrings` 生成。页面上的说明文字使用中文,具体的对象文档仍来自代码中的原始注释。

## 包

::: gimkit

## 核心模块

::: gimkit.guides

::: gimkit.schemas

::: gimkit.contexts

::: gimkit.dsls

::: gimkit.prompts

::: gimkit.log

::: gimkit.exceptions

## 模型后端

::: gimkit.models.base

::: gimkit.models.openai

::: gimkit.models.vllm

::: gimkit.models.vllm_offline

::: gimkit.models.utils
26 changes: 20 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GIMKit

**Guided Infilling Modeling Toolkit** — precise structured text generation using language models.
**Guided Infilling Modeling Toolkit** — structured text generation and information extraction using language models.

GIMKit lets you define placeholders (masked tags) in text and have a language model fill them in. It gives you fine-grained control over model outputs through a typed tag system with optional regex constraints.

Expand All @@ -10,15 +10,29 @@ GIMKit lets you define placeholders (masked tags) in text and have a language mo

---

## What Can You Do With GIMKit?

GIMKit is a **general-purpose information extraction framework**. Write a natural-language template with embedded typed placeholders, and the model extracts structured data from any unstructured text.

| Use Case | Description |
|----------|-------------|
| **Contact extraction** | Parse names, emails, phone numbers from free-form text |
| **Named entity recognition** | Extract organizations, people, locations, dates |
| **Text classification** | Categorize text into labels, assign sentiment |
| **Event extraction** | Pull structured event info (what/where/when/impact) |
| **Relation extraction** | Find entities and the relationships between them |
| **Resume / CV parsing** | Extract candidate name, title, education, experience |
| **Product review analysis** | Parse product, price, rating, pros and cons |
| **Privacy & PII protection** | Extract, classify, redact, and filter PII |

See the [Classic IE Use Cases](use-cases/classic.md), [Privacy and PII Use Cases](use-cases/privacy-pii.md), and [Other Use Cases](use-cases/others.md) pages for full code examples.

---

## Features

- **Masked tag system** — embed typed placeholders directly in f-strings.
- **Regex constraints** — restrict model output to specific patterns.
- **Named access** — retrieve results by tag name or index.
- **Multiple backends** — OpenAI, vLLM (server and offline).
- **Small-model friendly** — designed to work well with compact open-source models.

## Design Philosophy

- **Stable over feature** — reliability and correctness are prioritized above new features.
- **Small open-source model first** — designed to work well with small, freely available language models.
38 changes: 38 additions & 0 deletions docs/index.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# GIMKit

**Guided Infilling Modeling Toolkit** — 基于语言模型的结构化文本生成与信息抽取工具。

GIMKit 允许你在文本中定义占位符(masked tags),由语言模型来填充。通过类型化的标签系统和可选的正则约束,实现对模型输出的精细控制。

[![PyPI Version](https://img.shields.io/pypi/v/gimkit?label=pypi%20package)](https://pypi.org/project/gimkit)
[![Python Versions](https://img.shields.io/pypi/pyversions/gimkit.svg)](https://pypi.org/project/gimkit)
[![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)](https://pypi.org/project/gimkit)

---

## GIMKit 能做什么?

GIMKit 是一个**通用信息抽取框架**。用自然语言写一个模板,嵌入类型化的占位符,模型就能从任意非结构化文本中提取结构化数据。

| 应用场景 | 说明 |
|----------|------|
| **联系人提取** | 从自由文本中解析姓名、邮箱、电话 |
| **命名实体识别** | 提取组织、人物、地点、日期 |
| **文本分类** | 对文本进行分类、情感标注 |
| **事件抽取** | 提取结构化事件信息(何事/何地/何时/影响) |
| **关系抽取** | 发现实体及其之间的关系 |
| **简历解析** | 提取候选人姓名、职位、学历、经验 |
| **评论分析** | 解析产品名、价格、评分、优缺点 |
| **隐私与 PII 保护** | 提取、分类、脱敏和过滤个人信息 |

完整代码示例见 [经典信息抽取案例](use-cases/classic.zh.md)、[隐私与 PII 案例](use-cases/privacy-pii.zh.md) 和 [其他应用案例](use-cases/others.zh.md) 页面。

---

## 特性

- **标签系统** — 直接在 f-string 中嵌入类型化占位符。
- **正则约束** — 将模型输出限制为特定模式。
- **按名访问** — 通过标签名或索引获取结果。
- **多后端支持** — OpenAI、vLLM(服务端和离线模式)。
- **小模型友好** — 专为小型开源模型设计。
25 changes: 25 additions & 0 deletions docs/installation.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# 安装

## 标准安装

使用 pip 安装 GIMKit:

```bash
pip install gimkit
```

## 支持 vLLM

安装时附带可选的 `vllm` 依赖以启用 vLLM 后端:

```bash
pip install gimkit[vllm]
```

!!! note
vLLM 仅支持 Linux。在 Windows 和 macOS 上请省略 `[vllm]` 选项。

## 系统要求

- Python 3.10 或更高版本
- Linux、macOS 或 Windows
50 changes: 50 additions & 0 deletions docs/models/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Model Usage Overview

This page compares supported clients and explains when to use each mode.

## Client Comparison

| Client | Constructor | Best for |
|---|---|---|
| OpenAI | `from_openai(client, model_name=...)` | Hosted OpenAI-compatible APIs |
| vLLM (Server) | `from_vllm(client, model_name=...)` | OpenAI-compatible vLLM HTTP server |
| vLLM (Offline) | `from_vllm_offline(llm)` | Local offline inference with `vllm.LLM` |

## Support Matrix

| Capability | OpenAI | vLLM (Server) | vLLM (Offline) |
|---|---|---|---|
| `use_gim_prompt=True` | Recommended | Only for non-GIM models | Only for non-GIM models |
| `output_type=None` | Fallback when JSON is unsupported | Available but not recommended | Available but not recommended |
| `output_type="cfg"` | Not available | Recommended | Recommended |
| `output_type="json"` | Yes | Yes | Yes |

## Initialization Differences

- OpenAI and vLLM server mode both take an OpenAI-compatible client object.
- vLLM offline mode takes a `vllm.LLM` instance, not an OpenAI client.
- For vLLM server mode, create the client with `base_url` pointing to your server.

## Prompt Usage Recommendation

- Most local workflows use GIM-trained models: `use_gim_prompt=False` is preferred.
- For non-GIM-trained models, enable `use_gim_prompt=True`.
- For OpenAI paths, prefer `use_gim_prompt=True`.

## Output Type Guide

### OpenAI

- Prefer `output_type="json"`.
- If your OpenAI provider does not support JSON constraints, use `output_type=None`.

### vLLM (Server / Offline)

- Prefer `output_type="cfg"` for both GIM-trained and non-GIM models.
- `output_type="json"` is available when JSON output is specifically needed.

## Common Optional Flags

- `include_grammar=True`: include grammar text in query input.
- `backend`: choose Outlines backend implementation.
- `**inference_kwargs`: pass generation parameters to the underlying backend.
Loading
Loading