Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool
| `android-native-dev` | Android native application development with Material Design 3. Kotlin / Jetpack Compose, adaptive layouts, Gradle configuration, accessibility (WCAG), build troubleshooting, performance optimization, and motion system. | Official |
| `ios-application-dev` | iOS application development guide covering UIKit, SnapKit, and SwiftUI. Touch targets, safe areas, navigation patterns, Dynamic Type, Dark Mode, accessibility, collection views, and Apple HIG compliance. | Official |
| `flutter-dev` | Flutter cross-platform development covering widget patterns, Riverpod/Bloc state management, GoRouter navigation, performance optimization, and testing strategies. | Official |
| `excel-wps-table-diagnosis` | Diagnose table structure, plan cleanup workflows, and troubleshoot formulas in Excel/WPS spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers data quality auditing (blank rows, duplicates, mixed types, whitespace), lookup key preparation, formula troubleshooting (#REF!, #VALUE!, etc.), and approval-based cleanup workflows. Read-only diagnosis — pauses for user approval before any write operations. | Community |
| `react-native-dev` | React Native and Expo development guide covering components, styling, animations, navigation, state management, forms, networking, performance optimization, testing, native capabilities, and engineering (project structure, deployment, SDK upgrades, CI/CD). | Official |
| `shader-dev` | Comprehensive GLSL shader techniques for creating stunning visual effects — ray marching, SDF modeling, fluid simulation, particle systems, procedural generation, lighting, post-processing, and more. ShaderToy-compatible. | Official |
| `gif-sticker-maker` | Convert photos (people, pets, objects, logos) into 4 animated GIF stickers with captions. Funko Pop / Pop Mart style, powered by MiniMax Image & Video Generation API. | Official |
Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
| `android-native-dev` | 基于 Material Design 3 的 Android 原生应用开发。Kotlin / Jetpack Compose、自适应布局、Gradle 配置、无障碍(WCAG)、构建问题排查、性能优化与动效系统。 | Official |
| `ios-application-dev` | iOS 应用开发指南,涵盖 UIKit、SnapKit 和 SwiftUI。触控目标、安全区域、导航模式、Dynamic Type、深色模式、无障碍、集合视图,符合 Apple HIG 规范。 | Official |
| `flutter-dev` | Flutter 跨平台开发指南,涵盖 Widget 模式、Riverpod/Bloc 状态管理、GoRouter 导航、性能优化与测试策略。 | Official |
| `excel-wps-table-diagnosis` | 诊断电子表格文件(.xlsx、.xlsm、.csv、.tsv)的表格结构,规划数据清理工作流,并排查公式问题。涵盖数据质量审计(空白行、重复项、类型混乱,空格)、查找键准备、公式排障(#REF!、#VALUE! 等)以及基于审批的清理工作流。诊断为只读操作 — 在任何写入操作之前暂停并等待用户审批。 | Community |
| `react-native-dev` | React Native 与 Expo 开发指南,涵盖组件、样式、动画、导航、状态管理、表单、网络请求、性能优化、测试、原生能力及工程化(项目结构、部署、SDK 升级、CI/CD)。 | Official |
| `shader-dev` | 全面的 GLSL 着色器技术,用于创建惊艳的视觉效果 — 光线行进、SDF 建模、流体模拟、粒子系统、程序化生成、光照、后处理等。兼容 ShaderToy。 | Official |
| `gif-sticker-maker` | 将照片(人物、宠物、物品、Logo)转换为 4 张带字幕的动画 GIF 贴纸。Funko Pop / Pop Mart 盲盒风格,基于 MiniMax 图片与视频生成 API。 | Official |
Expand Down
177 changes: 177 additions & 0 deletions skills/excel-wps-table-diagnosis/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
name: excel-wps-table-diagnosis
description: >-
Diagnose table structure, plan cleanup workflows, and troubleshoot formulas in Excel/WPS spreadsheet files (.xlsx, .xlsm, .csv, .tsv).
Use when the user asks to audit, inspect, check, diagnose, or analyze the quality and structure of a spreadsheet,
or when they want a cleanup plan before modifying data. Triggers on 'diagnose', 'audit', 'check quality',
'find problems', 'table structure', 'data cleanup', 'duplicate', 'blank row', 'inconsistent format',
or any request to understand issues in a spreadsheet before editing it. This skill does NOT modify files —
it produces a diagnosis report and pauses for user approval before any write operations.
license: MIT
metadata:
version: "1.0"
category: productivity
sources:
- ECMA-376 Office Open XML File Formats
- Microsoft Open XML SDK documentation
---

# Excel/WPS Table Diagnosis Skill

Diagnose spreadsheet problems and produce a cleanup plan. **This skill is read-only — it does not modify files.**
After diagnosis, present findings to the user and await approval before any write operations.

## Diagnosis Workflow

### Step 1 — Structure Discovery

Use `xlsx_reader.py` (from `minimax-xlsx/scripts/`) for initial structure and quality audit:

```bash
python3 SKILL_DIR/../minimax-xlsx/scripts/xlsx_reader.py input.xlsx --quality
python3 SKILL_DIR/../minimax-xlsx/scripts/xlsx_reader.py input.xlsx --json
```

For CSV/TSV files, load directly with pandas:

```python
import pandas as pd
df = pd.read_csv("file.csv")
```

### Step 2 — Deep Diagnosis

After structure discovery, perform targeted analysis using pandas and XML inspection:

**Header and structure analysis:**
```python
import pandas as pd

df = pd.read_excel("input.xlsx", sheet_name=None) # all sheets
for name, sheet in df.items():
print(f"Sheet: {name}")
print(f" Shape: {sheet.shape}")
print(f" Columns: {list(sheet.columns)}")
print(f" dtypes:\n{sheet.dtypes}")
print(f" Nulls: {sheet.isnull().sum().to_dict()}")
print(f" Duplicates: {sheet.duplicated().sum()}")
```

**Data quality issues to flag:**
| Issue | Detection | Impact |
|-------|-----------|--------|
| Blank rows | `sheet[sheet.isnull().all(axis=1)]` | Breaks pivot tables, VLOOKUP |
| Merged cells | Inspect `<mergeCells>` in XML | Misaligns data reading |
| Duplicate headers | `sheet.columns.duplicated()` | Collapses column names |
| Mixed types in column | `sheet.applymap(type).nunique() > 1` | Causes calculation errors |
| Leading/trailing spaces | `sheet.apply(lambda x: x.astype(str).str.strip().ne(x))` | VLOOKUP failures |
| Inconsistent date formats | Detect via `pd.to_datetime(errors='coerce')` | Date calculation failures |
| Numeric stored as text | `pd.to_numeric(errors='coerce')` produces NaN | SUM/AVG ignore text |

**Formula auditing:**
```python
import zipfile, re

with zipfile.ZipFile("input.xlsx") as z:
for name in z.namelist():
if name.startswith("xl/worksheets/sheet"):
content = z.read(name).decode()
formulas = re.findall(r'<f>([^<]+)</f>', content)
print(f"{name}: {len(formulas)} formulas")
```

### Step 3 — Produce Diagnosis Report

Format findings as:

```
## Table Diagnosis Report: {filename}

### 1. File Overview
- Format: .xlsx / .xlsm / .csv
- Sheets: {count}
- Dimensions: {rows} rows × {cols} columns

### 2. Data Quality Issues
| # | Sheet | Issue | Rows Affected | Severity |
|---|-------|-------|---------------|----------|
| 1 | Sheet1 | 12 blank rows | rows 15, 28, ... | High |
| 2 | Sheet1 | Column C: mixed text/number | row 4, 9, 17 | Medium |

### 3. Cleanup Plan
1. Remove 12 blank rows (rows 15, 28, ...)
2. Standardize column C: convert text to number (rows 4, 9, 17)
3. Trim whitespace in column B

### 4. Lookup/Matching Workflow Recommendations
- Column A (ID) can serve as VLOOKUP key after deduplication
- Consider INDEX/MATCH instead of VLOOKUP for multi-sheet lookup

### 5. Formula Health Check
- 3 broken formulas detected (see details below)
- 0 circular references found

### 6. User Approval Required
**Proposed actions:**
- Remove 12 blank rows
- Standardize 3 columns (C, D, E) to consistent types
- Trim whitespace in 2 columns (B, F)
- Fix 3 broken formulas

> ⚠️ This report was generated automatically. Review each item before approving changes.
> Reply **APPROVE** to proceed with cleanup, or specify which items to skip.
```

## Lookup and Matching Reference

### VLOOKUP (when to use)
```python
# VLOOKUP: find in leftmost column, return Nth column to the right
= VLOOKUP(lookup_value, table_range, col_index, FALSE)

# Common failure causes:
# 1. Lookup value not in first column
# 2. Table range shifts when copied → use absolute refs: $A$1:$D$100
# 3. Approximate match (TRUE) used instead of exact (FALSE)
```

### INDEX/MATCH (preferred for complex lookups)
```python
= INDEX(return_range, MATCH(lookup_value, lookup_range, 0))
# Works regardless of column position, more flexible than VLOOKUP
```

### XLOOKUP (Excel 365 / WPS newer versions)
```python
= XLOOKUP(lookup_value, lookup_array, return_array, if_not_found)
```

## Formula Troubleshooting Guide

**See `references/formula-troubleshooting.md` for detailed patterns.**

Common formula failures and fixes:

| Error | Cause | Fix |
|-------|-------|-----|
| `#REF!` | Deleted column/row referenced | Update cell references |
| `#VALUE!` | Wrong data type in formula | Convert with `VALUE()`, `TEXT()` |
| `#NAME?` | Unrecognized function | Check Excel vs WPS function names |
| `#DIV/0!` | Division by zero | Wrap with `IFERROR(..., 0)` |
| `#N/A` | Lookup value not found | Verify with `IFERROR(VLOOKUP(...), "not found")` |
| `####` | Column too narrow | Auto-fit column width |
| Circular ref | Cell references itself | Trace precedents with Formulas tab |

## Approval-Based Execution Flow

1. **Diagnose** (this skill) → produces report → user reviews
2. **User replies APPROVE** or lists items to skip
3. **Edit skill activates** → uses `minimax-xlsx` EDIT workflow for modifications
4. **Deliver output file** → verify with `xlsx_reader.py`

**Approval message format:**
```
APPROVE # approve all proposed changes
APPROVE, skip item 2 and 3 # partial approval
SKIP cleanup, just fix formulas # partial approval
```
173 changes: 173 additions & 0 deletions skills/excel-wps-table-diagnosis/references/cleanup-planning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Cleanup Planning Reference

> Guide for planning and executing spreadsheet cleanup operations safely.

## Cleanup Priority Order

1. **High severity first**: Blanks → merged cells → type inconsistencies → duplicates → whitespace
2. **Non-destructive first**: Add correction columns rather than overwriting originals
3. **Verify after each step**: Re-read file after each change to confirm expected state

## Blank Row Removal

**When to remove**: Blank rows between data break VLOOKUP, pivot tables, and AutoFilter.

**Safe removal approach**:
```python
import pandas as pd

df = pd.read_excel("input.xlsx", sheet_name="Sales")
# Keep only rows with at least one non-null value
df_clean = df.dropna(how='all')
# Or keep rows where key column is not null
df_clean = df[df['ID'].notna()] # adjust 'ID' to actual key column
```

**⚠️ Before removing blanks, check**:
- Are blanks intentional separators (section dividers)?
- Does any formula reference these row numbers?
- Will removing them shift downstream SUM ranges?

## Duplicate Handling

```python
# Find exact duplicates
dups = df[df.duplicated(keep=False)]
print(f"Duplicate rows: {len(dups)}")

# Keep first occurrence, remove rest
df_deduped = df.drop_duplicates(keep='first')

# Remove duplicates based on specific columns only
df_deduped = df.drop_duplicates(subset=['Name', 'Email'], keep='first')
```

## Type Standardization

### Text to Number
```python
def text_to_number(series):
"""Convert text numbers to actual numbers."""
def try_convert(val):
if isinstance(val, str):
cleaned = val.strip().replace(',', '').replace('$', '').replace('¥', '').replace('€', '').replace('%', '')
try:
return float(cleaned) if '.' in cleaned else int(cleaned)
except ValueError:
return val
return val
return series.apply(try_convert)

df['Revenue'] = text_to_number(df['Revenue'])
```

### Number to Text (for IDs/codes)
```python
df['ProductCode'] = df['ProductCode'].astype(str).str.strip()
```

### Date Standardization
```python
df['OrderDate'] = pd.to_datetime(df['OrderDate'], errors='coerce')
# If many dates are text with mixed formats:
df['OrderDate'] = pd.to_datetime(df['OrderDate'], format='%Y/%m/%d', errors='coerce')
```

## Whitespace Cleanup

```python
# Trim leading/trailing whitespace in all text columns
str_cols = df.select_dtypes(include='object').columns
df[str_cols] = df[str_cols].apply(lambda x: x.str.strip() if hasattr(x, 'str') else x)

# Normalize internal multiple spaces to single space
df[str_cols] = df[str_cols].apply(
lambda x: x.str.replace(r'\s+', ' ', regex=True) if hasattr(x, 'str') else x
)
```

## Merged Cell Handling

**Merged cells cause misalignment in pandas read_excel.**

```python
# When reading a file with merged cells:
df = pd.read_excel("input.xlsx", header=None) # raw read, no header parsing
# Manually set header from row 0
# Manually fill merged cell values down
```

**In XML (for direct editing)**, merged cells look like:
```xml
<mergeCells count="2">
<mergeCell ref="B2:D2"/> <!-- B2 merged with C2 and D2 -->
</mergeCells>
```

**Before editing**: Unmerge all cells first, then re-merge if needed after cleanup.

## Column Operations

### Add a Cleanup Status Column
```python
# Add a column tracking what was changed (useful for audit)
df['__cleanup_notes'] = ''
df.loc[df['Revenue'].apply(lambda x: isinstance(x, str)), '__cleanup_notes'] += 'text-to-number;'
df.loc[df['Name'].str.contains(r'^\s|\s$', regex=True, na=False), '__cleanup_notes'] += 'trim-whitespace;'
```

### Reorder Columns
```python
# Put key columns first, cleanup-notes last
cols = [c for c in df.columns if not c.startswith('__')]
cols += [c for c in df.columns if c.startswith('__')]
df = df[cols]
```

## Lookup Key Preparation

**Before using VLOOKUP/INDEX-MATCH, ensure keys are clean:**

```python
# VLOOKUP-ready key requirements:
# 1. No leading/trailing spaces
# 2. Consistent case (or UPPER() applied)
# 3. No hidden characters
# 4. No mixed types (all text or all number)

df['LookupKey'] = df['LookupKey'].astype(str).str.strip().str.upper()
# Or for numeric keys:
df['LookupKey'] = pd.to_numeric(df['LookupKey'], errors='coerce')
```

## Approval Workflow

After diagnosis, present cleanup plan with these categories:

```
## Proposed Cleanup Plan

### High Priority (fix before any analysis)
1. Remove 12 blank rows in Sheet1
2. Unmerge 3 cell ranges in Sheet2
3. Convert column C from text to number (47 cells)

### Medium Priority (fix for correct results)
4. Remove 5 duplicate rows in Sheet1
5. Trim whitespace in columns B, D, F (Sheet1)
6. Standardize date format in column G (Sheet1)

### Low Priority (nice to have)
7. Add data validation dropdowns to column A
8. Freeze top row (Sheet1)

### Estimated Impact
- Rows after cleanup: ~{n} (from ~{m})
- Formulas that may be affected: 3 (will verify after changes)
```

**User response to approval:**
- `APPROVE` → proceed with all items
- `APPROVE, skip 2 and 5` → selective approval
- `APPROVE 1-3 only` → partial approval
- Any other response → ask for clarification
Loading