Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.test.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FREDY_PROXY_URL=http://{login}:{password}@proxy_site.com:{port}
4 changes: 2 additions & 2 deletions .github/workflows/claude-code-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
pull-requests: write
issues: read
id-token: write

Expand All @@ -35,7 +35,7 @@ jobs:
id: claude-review
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
plugin_marketplaces: 'https://github.com/anthropics/claude-code.git'
plugins: 'code-review@claude-code-plugins'
prompt: '/code-review:code-review ${{ github.repository }}/pull/${{ github.event.pull_request.number }}'
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ npm-debug.log
.idea
.vscode
tools/release/config.json
.env.test
87 changes: 87 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Fredy — Development Guide

## Spec-Driven Development with OpenSpec

Every PR in this repo MUST start with a planning phase using [OpenSpec](https://github.com/Fission-AI/OpenSpec). Write the spec before the code.

### Workflow

1. **Propose** — Create `openspec/changes/<change-name>/proposal.md` describing the problem, intent, approach, and scope.
2. **Spec** — Write delta specs in `openspec/changes/<change-name>/specs/<domain>/spec.md` using ADDED/MODIFIED/REMOVED sections with requirements (SHALL/MUST/SHOULD) and Given-When-Then scenarios.
3. **Design** — Create `openspec/changes/<change-name>/design.md` with architecture decisions, key trade-offs, and diagrams.
4. **Tasks** — Create `openspec/changes/<change-name>/tasks.md` with a checklist of implementation steps.
5. **Implement** — Write tests first (TDD), then code. Check off tasks as completed.
6. **Archive** — When done, move the change folder to `openspec/archive/<change-name>/` and merge delta specs into `openspec/specs/<domain>/spec.md`.

### Directory Structure

```
openspec/
├── specs/ # Authoritative specs (current system behavior)
│ └── <domain>/
│ └── spec.md
├── changes/ # Active work (proposals in progress)
│ └── <change-name>/
│ ├── proposal.md
│ ├── design.md
│ ├── tasks.md
│ └── specs/
│ └── <domain>/
│ └── spec.md # Delta spec (ADDED/MODIFIED/REMOVED)
└── archive/ # Completed changes
└── <change-name>/
├── proposal.md
├── design.md
├── tasks.md
└── specs/
```

### Spec Format

Requirements use RFC 2119 keywords:
```markdown
### REQ-DOMAIN-001: Short Name
The system SHALL do something specific.
```

Scenarios use Given-When-Then:
```markdown
#### Scenario: Happy path
- GIVEN some precondition
- WHEN an action occurs
- THEN the expected outcome happens
```

Delta specs label changes:
```markdown
## ADDED Requirements
### REQ-X-001: New behavior
...

## MODIFIED Requirements
### REQ-X-002: Changed behavior
(Previously: old behavior)

## REMOVED Requirements
### REQ-X-003: Deprecated behavior
```

## Code Conventions

- **ES Modules** — `"type": "module"`, all imports use ESM syntax
- **ESLint** must pass (`yarn lint`) — enforced by pre-commit hook
- **Prettier** formatting: single quotes, 120 char print width — auto-applied by pre-commit hook
- **`no-console` rule** — only `console.warn` and `console.error` allowed; use `logger` for other logging
- **`fetch` is a global** in ESLint config — no need to import for native fetch
- **Node.js** — `>=22.0.0`

## Testing

- **Framework**: Vitest — globals enabled (`describe`, `it`, `vi` available without import)
- **Import**: `import { expect } from 'vitest'` (expect is not a global)
- **Test location**: `test/` directory mirroring source structure
- **File pattern**: `*.test.js`
- **Timeout**: 60s per test (configured globally)
- **Run**: `yarn test` (or `npx vitest run`)
- **TDD**: Write tests before implementation for new modules
- **Mocking**: `vi.mock()` at top level, `vi.hoisted()` for mock variables used in factories, `vi.doMock()` + `vi.resetModules()` for per-test isolation
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,14 @@ For more information on how to set it up and use it, please refer to the [MCP Re

------------------------------------------------------------------------

## 🌐 Proxy Support

Fredy can route all outbound traffic through an HTTP proxy — both plain HTTP requests and Puppeteer browser sessions. Set `FREDY_PROXY_URL` as an environment variable or configure it in the Web UI settings.

See the [Proxy Documentation](doc/PROXY.md) for setup instructions, Docker examples, and how to run provider tests with a proxy.

------------------------------------------------------------------------

## Immoscout

Immoscout has implemented advanced bot detection. In order to work around this, we are using a reversed engineered version of their mobile api. See [Immoscout Reverse Engineering Documentation](https://github.com/orangecoding/fredy/blob/master/reverse-engineered-immoscout.md)
Expand Down
125 changes: 125 additions & 0 deletions doc/PROXY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Proxy Configuration

Fredy supports routing all scraping traffic (both HTTP requests and Puppeteer browser sessions) through an HTTP proxy. This is useful when providers block direct requests or when running Fredy behind a corporate firewall.

> **Note:** Only scraping requests are proxied. Notifications (Telegram, Slack, email, etc.), geocoding, and analytics are **not** affected.

---

## Configuring the Proxy

### Via Web UI

Open Fredy's Web UI and navigate to **General Settings → Network** tab. Enter the proxy URL in the **Proxy URL** field.

Supported formats:

```
http://proxy.example.com:8080
http://username:password@proxy.example.com:8080
https://proxy.example.com:8443
```

Examples using common proxy providers:

| Provider | Proxy URL |
|----------|-----------|
| iProyal | `http://customer-user:pass_country-de@geo.iproyal.com:12321` |
| Bright Data | `http://lum-customer-C1234-zone-zone1:pass@brd.superproxy.io:22225` |
| Self-hosted (no auth) | `http://192.168.1.50:3128` |

After saving, Fredy immediately uses the new proxy for all subsequent scraping runs — no restart required.

### Via Environment Variables

If no proxy URL is set in the Web UI, Fredy checks the following environment variables (in order):

1. `FREDY_PROXY_URL`
2. `HTTPS_PROXY`
3. `HTTP_PROXY`

#### Docker

```bash
docker run -d --name fredy \
-e FREDY_PROXY_URL=http://user:pass@proxy.example.com:8080 \
-v fredy_conf:/conf \
-v fredy_db:/db \
-p 9998:9998 \
ghcr.io/orangecoding/fredy:master
```

#### Node.js

```bash
FREDY_PROXY_URL=http://user:pass@proxy.example.com:8080 yarn run start:backend
```

### Priority

When multiple sources are configured, the proxy URL is resolved in this order:

1. **Web UI setting** (highest priority)
2. `FREDY_PROXY_URL` env var
3. `HTTPS_PROXY` env var
4. `HTTP_PROXY` env var (lowest priority)

---

## How It Works

Fredy has two outbound scraping paths, and both respect the proxy configuration:

| Path | Used by | Mechanism |
|------|---------|-----------|
| **HTTP fetch** | immoscout (mobile API) | `undici.ProxyAgent` passed as `dispatcher` to `fetch()` |
| **Puppeteer** | All other providers | Chrome launched with `--proxy-server` flag; credentials passed via `page.authenticate()` |

When the proxy URL includes credentials (`username:password@`), authentication is handled automatically — via a `Basic` auth header for HTTP requests, and via Chrome's built-in proxy auth for Puppeteer.

---

## Testing Providers with a Proxy

Provider tests are integration tests that hit real external websites. Some sites block direct requests, so running them through a proxy can improve reliability.

### Setup

1. Copy the example env file and add your proxy credentials:

```bash
cp .env.test.example .env.test
```

2. Edit `.env.test` and set `FREDY_PROXY_URL`:

```
FREDY_PROXY_URL=http://user:pass@proxy.example.com:8080
```

`.env.test` is gitignored and will not be committed.

### Running Tests

```bash
# Provider tests without proxy (direct connection)
yarn test:provider

# Provider tests with proxy (loads variables from .env.test)
yarn test:provider:proxy
```

Both commands use a dedicated vitest config (`vitest.provider.config.js`) that runs only the provider integration tests.

### What Gets Tested

| Script | HTTP providers (immoscout) | Puppeteer providers (11 others) |
|--------|---------------------------|--------------------------------|
| `yarn test:provider` | Direct fetch | Direct browser launch |
| `yarn test:provider:proxy` | Fetch via proxy | Browser with `--proxy-server` |

### Notes

- Provider tests are **excluded from CI** (`yarn testGH`) because they depend on external sites that may be unavailable or rate-limited.
- Some providers may still fail even with a proxy due to bot detection or changed page structure — this is expected for integration tests against live websites.
- The full unit test suite (`yarn test`) does not require a proxy and should always pass.
Binary file added doc/images/proxy-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions lib/provider/immoscout.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,12 @@ import {
} from '../services/immoscout/immoscout-web-translator.js';
import logger from '../services/logger.js';
import { getUserSettings } from '../services/storage/settingsStorage.js';
import { scrapingFetch } from '../services/http/httpClient.js';
let appliedBlackList = [];
let currentUserId = null;

async function getListings(url) {
const response = await fetch(url, {
const response = await scrapingFetch(url, {
method: 'POST',
headers: {
'User-Agent': 'ImmoScout_27.12_26.2_._',
Expand Down Expand Up @@ -96,7 +97,7 @@ async function getListings(url) {
}

async function pushDetails(listing) {
const detailed = await fetch(`https://api.mobile.immobilienscout24.de/expose/${listing.id}`, {
const detailed = await scrapingFetch(`https://api.mobile.immobilienscout24.de/expose/${listing.id}`, {
headers: {
'User-Agent': 'ImmoScout_27.3_26.0_._',
'Content-Type': 'application/json',
Expand Down Expand Up @@ -151,7 +152,7 @@ function buildDescription(detailBody) {
}

async function isListingActive(link) {
const result = await fetch(convertImmoscoutListingToMobileListing(link), {
const result = await scrapingFetch(convertImmoscoutListingToMobileListing(link), {
headers: {
'User-Agent': 'ImmoScout_27.12_26.2_._',
},
Expand Down
25 changes: 23 additions & 2 deletions lib/services/extractor/puppeteerExtractor.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import {
applyLanguagePersistence,
applyPostNavigationHumanSignals,
} from './botPrevention.js';
import { parseProxyUrl } from '../http/httpClient.js';
import logger from '../logger.js';
import fs from 'fs';
import os from 'os';
Expand All @@ -33,8 +34,16 @@ export async function launchBrowser(url, options) {
preCfg.windowSizeArg,
...preCfg.extraArgs,
];
let proxyAuth = null;
if (options?.proxyUrl) {
launchArgs.push(`--proxy-server=${options.proxyUrl}`);
const parsed = parseProxyUrl(options.proxyUrl);
launchArgs.push(`--proxy-server=${parsed.serverUrl}`);
if (parsed.username) {
proxyAuth = {
username: decodeURIComponent(parsed.username),
password: decodeURIComponent(parsed.password),
};
}
}

let userDataDir;
Expand Down Expand Up @@ -62,6 +71,7 @@ export async function launchBrowser(url, options) {

browser.__fredy_userDataDir = userDataDir;
browser.__fredy_removeUserDataDir = removeUserDataDir;
browser.__fredy_proxyAuth = proxyAuth;

return browser;
}
Expand Down Expand Up @@ -93,10 +103,21 @@ export default async function execute(url, waitForSelector, options) {
debug(`Sending request to ${url} using Puppeteer.`);

if (!isExternalBrowser) {
browser = await launchBrowser(url, options);
let launchOpts = options;
if (!launchOpts?.proxyUrl) {
const { getProxyConfig } = await import('../http/httpClient.js');
const proxyConfig = getProxyConfig();
if (proxyConfig) {
launchOpts = { ...launchOpts, proxyUrl: proxyConfig.rawUrl };
}
}
browser = await launchBrowser(url, launchOpts);
}

page = await browser.newPage();
if (browser.__fredy_proxyAuth) {
await page.authenticate(browser.__fredy_proxyAuth);
}
const preCfg = getPreLaunchConfig(url, options || {});
await applyBotPreventionToPage(page, preCfg);
// Provide languages value before navigation
Expand Down
11 changes: 1 addition & 10 deletions lib/services/geocoding/client/nominatimClient.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,11 @@

import os from 'os';
import crypto from 'crypto';
import https from 'https';
import fetch from 'node-fetch';
import pThrottle from 'p-throttle';
import logger from '../../logger.js';

const API_URL = 'https://nominatim.openstreetmap.org/search';

const agent = new https.Agent({
keepAlive: true,
keepAliveMsecs: 1000,
});

const throttle = pThrottle({
limit: 1,
interval: 1000,
Expand Down Expand Up @@ -66,8 +59,7 @@ async function doGeocode(address) {

try {
const response = await fetch(url, {
agent,
timeout: 60000,
signal: AbortSignal.timeout(60000),
headers: {
'User-Agent': userAgent,
},
Expand Down Expand Up @@ -116,7 +108,6 @@ async function doAutocomplete(query) {

try {
const response = await fetch(url, {
agent,
headers: {
'User-Agent': userAgent,
},
Expand Down
Loading
Loading