Skip to content

su7ox/SafeNav

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SafeNav – Web & Link Safety Analysis Platform

Status: In Development Phase 1

React Vite JavaScript Python FastAPI PostgreSQL Redis Celery Docker


SafeNav is an web and link analysis platform designed to evaluate the safety of URLs, websites, and application links.
It performs multi-layer analysis using static inspection, reputation checks, and strict heuristic rule-based scoring to identify potentially malicious or unsafe links. The project is structured as a full-stack system with a React frontend and a Python-based backend, focusing on real-world web security use cases such as phishing detection, suspicious domain analysis, and unsafe content identification.


🚀 Key Features

URL Normalization & Parsing

Handles different types of links including shortened URLs, redirects, and malformed URLs. Applies RFC 3986-compliant sanitization, percent-decoding (including double-encoded attacks), Punycode (IDN) conversion, and scheme/host standardization before any analysis begins.

Static & Lexical Analysis

Detects suspicious patterns such as abnormal URL length, special characters, and domain structure anomalies. Includes typosquatting detection via Levenshtein/Jaro-Winkler distance, keyword analysis, and Shannon entropy scoring for DGA (Domain Generation Algorithm) detection.

Redirect Chain Tracing

Follows HTTP redirect chains (301, 302, 307) without executing JavaScript, detecting cross-domain hops, redirect loops, and excessive hop counts that indicate cloaking or obfuscation. Flags client-side redirects (meta-refresh, window.location) for deeper analysis.

SSL/TLS Certificate Inspection

Analyzes certificate type (DV/OV/EV), issuer, age (newly issued certs under 48 hours are high-risk), and cipher suite strength. Over 80% of phishing sites now use HTTPS — the padlock alone is not a safety signal.

Reputation & Domain Intelligence

Evaluates domain age via WHOIS/RDAP, suspicious TLD detection, and registrar reputation. Domains registered under one week are flagged as critical risk. Caches results via Redis to handle rate limits efficiently.

Weighted Risk Fusion Engine

Aggregates additive heuristic penalties into a single 0–100 Risk Score. Critical indicators (e.g., insecure login form, blacklist hit) immediately override to 100. Every verdict includes a human-readable reasoning list.

Modular Detection Pipeline

Designed with separable components for easy extension, testing, and experimentation. Each of the seven analysis modules operates independently and feeds into a central score aggregator.


🧱 Project Architecture

SafeNav is organized as a full-stack application:

  • frontend/ – React-based user interface
  • backend/ – Python backend responsible for API handling and the seven-module static analysis pipeline

⚙️ Backend – Phase 1: Static Analysis Engine

The backend implements a "Fail-Fast" architecture: all Phase 1 checks run within milliseconds to a few seconds using only the URL string, DNS records, SSL handshake, and HTTP response headers — no browser rendering, no JavaScript execution.

Why Fail-Fast? Approximately 90% of malicious links can be caught through surface-level inspection alone. By filtering these at Phase 1, expensive dynamic sandboxing (Phase 2) is reserved only for ambiguous or heavily obfuscated targets.

Module I – Link Intake, Sanitization & Normalization

Before any security check runs, the raw URL is cleaned and converted to a canonical form to defeat common obfuscation tricks.

Step What Happens Why It Matters
Percent-decode (recursive) Decodes %xx escapes repeatedly until stable Defeats double-encoding attacks like %2520
Control character stripping Removes ASCII 0–31 and surrounding whitespace Prevents parser-breaking invisible characters
Scheme & host lowercasing HTTP://http://, domain to lowercase Ensures consistent, case-insensitive matching
Punycode (IDN) conversion Converts Unicode domains to xn--... ASCII Defeats homograph attacks (Cyrillic 'а' vs Latin 'a')
Length guard Rejects inputs over 2048 characters Prevents regex backtracking / DoS

Plain English: Think of this step as spell-checking and standardizing the URL before the real analysis starts — the same way a browser normalizes what you type before making a request.


Module II – Link Type Identification & Taxonomy

Once normalized, the URL is fingerprinted to classify its intent. Categories are non-mutually exclusive.

Type Detection Method Risk Signal
Standard Website http/https scheme, valid domain Baseline
IP-Based Link ipaddress library validates raw IPs in netloc High — phishing kits avoid domain blocklists this way
Shortened URL Domain matched against shortener database (bit.ly, t.co, etc.) Medium — destination is hidden
Direct Download Path extension checked against .exe .apk .zip .bat .ps1 blacklist High — immediate malware risk
App Deep Link Non-http scheme detected (e.g., whatsapp://, tg://) Medium — may trigger unauthorized app actions
Android Intent intent:// scheme parsed for package and target app High — reveals exactly which app is targeted

Query parameter values are also scanned for suspicious extensions (e.g., ?file=malware.exe) to prevent false negatives from indirect download links.


Module III – Lightweight Redirect Tracing

Phishers use redirect chains to bounce through legitimate-looking domains before reaching the malicious page. This module traces the full path without executing any client-side code.

  • User-Agent masquerading – requests mimic a real browser to bypass basic bot detection
  • Stream mode – headers and redirects are followed without downloading the full response body
  • Chain analysis – each hop's domain is compared; cross-domain transitions (e.g., google.com → attacker.xyz) increase the risk score
  • Loop guard – redirect chains capped at 10 hops to prevent infinite loops
  • Client-side redirect detection – response body is scanned via regex for meta http-equiv="refresh" and window.location, flagged for Phase 2 deep scan

Plain English: Like following every "click here" button automatically and reporting each stop on the journey, without actually loading the pages in a browser.


Module IV – SSL/TLS Certificate Inspection

HTTPS no longer implies safety. This module inspects the quality of the certificate, not just its presence.

Check Logic Risk Implication
Certificate type DV (domain-only) vs OV/EV (organization verified) DV certs are free, automated — standard for phishing
Issuer Flags Let's Encrypt, cPanel issuers on login pages High-risk combination
Certificate age Current Time − notBefore Under 48 hours → critical "burn domain" signal
Cipher suite Checks for deprecated TLS 1.0, SSLv3, RC4, NULL Indicates neglected or compromised server
SNI compatibility server_hostname included in socket handshake Required for multi-tenant hosts
Self-signed certs Handshake errors caught, flagged as "Invalid/Untrusted" Never crash — always report

Module V – Domain Reputation & History

A domain's registration history is one of the strongest predictors of malicious intent.

Signal Threshold Risk Level
Domain age < 7 days since registration Critical
Domain age < 30 days since registration High
Suspicious TLD .xyz, .top, .tk, .gq, .zip Medium (amplified by other signals)
WHOIS privacy/redaction Creation date missing Indeterminate (confidence-adjusted)
Rate limiting Handled via Redis cache (24-hour TTL) + rotating proxies Operational

Data is fetched via WHOIS (port 43) or the modern RDAP JSON API. The tldextract library isolates the effective second-level domain before lookup.


Module VI – Advanced Lexical Analysis & Phishing Detection

This module analyzes the text of the URL for visual and semantic deception patterns.

Typosquatting Detection
Levenshtein distance is calculated between the analyzed domain and a reference list of high-value phishing targets (Google, PayPal, Amazon, Microsoft, etc.). A distance of 1–2 flags the domain as a potential typosquat (e.g., gooogle.com). Jaro-Winkler distance is used additionally for subdomain spoofing detection. Checks are limited to the top 50–100 most-phished brands using the optimized python-Levenshtein C library for performance.

Keyword Analysis
The URL is scanned for trust-inducing keywords in subdomains and paths: login, secure, account, verify, update, support, billing. A URL like paypal-secure.com or apple.verify-id.com triggers a Suspicious Keyword flag.

Shannon Entropy (DGA Detection)
$$H(X) = - \sum_{i=1}^{n} P(x_i) \log_2 P(x_i)$$

High entropy in a domain name indicates random character distribution — a hallmark of Domain Generation Algorithms used by malware C2 servers (e.g., xkzj194.com). Known CDN providers (AWS, Akamai) are whitelisted to prevent false positives.


Module VII – Static Content Inspection

A lightweight HTML parser that looks for gross security violations in the page source without rendering it.

Insecure Login Form Detection
Using BeautifulSoup (bs4):

  1. Parse the HTML body
  2. Find <input type="password"> elements
  3. Check the parent <form action="..."> attribute
  4. Violation: If the page URL or form action uses http:// → immediately flagged as "Insecure Login Form" (credential theft risk)

If <script> tags are prevalent but no forms are found, the system notes "Dynamic Content Detected" and recommends Phase 2 analysis — covering React/Angular apps where forms are JS-generated.


🧮 Risk Scoring & Fusion

All seven module outputs are synthesized into a single Risk Score (0–100) using a Weighted Risk Fusion model:

$$\text{Risk Score} = \min\left(100,; \sum (P_i \times W_i)\right)$$

If a Critical Indicator is detected (e.g., phishing blacklist hit, insecure login form), the score is immediately forced to 100.

Detected Signal Severity Penalty
Typosquatting Match High +50
Domain Age < 7 Days High +40
Cross-Domain Redirect Low +15
Suspicious Keyword Medium +20
DV SSL Certificate Low +10
Insecure Login Form (HTTP) Critical +100 (Override)

If a check fails (e.g., WHOIS timeout), it is marked Indeterminate and the remaining weights are normalized — the score reflects only verified data without artificially deflating the result.


🚦 Result Tiers & Explainability

Score Verdict UI
0 – 30 Safe 🟢 Green Shield — "Safe to Visit"
31 – 69 Caution 🟡 Warning — "Proceed with Caution" (Phase 2 suggested)
70 – 100 High Risk 🔴 Red Alert — "Dangerous Link Detected"

Every verdict includes a Reasoning section — a plain-English list of exactly why the score was assigned:

  • "Domain registered only 3 days ago."
  • "Contains login keywords but uses a low-trust DV certificate."
  • "Redirects through 3 different domains."

This transparency builds user trust and turns SafeNav into an educational tool, not just a black-box filter.


🎨 Frontend

  • Built with React (Vite)
  • Responsible for user interaction and result visualization
  • Communicates with backend APIs to request URL analysis and display safety reports

🛠 Tech Stack

🎨 Frontend

Technology Role
React UI framework — component-based result dashboard
Vite Build tool — fast hot-module reload in development
JavaScript Primary frontend language
HTML5 CSS3 Markup & styling

⚙️ Backend

Technology Role
Python Core language — analysis pipeline, networking
FastAPI ASGI web framework — async-first, handles concurrent module calls
Redis WHOIS result caching (24h TTL) + Celery message broker
Celery Parallel task dispatch — runs modules concurrently
BeautifulSoup (bs4) Static HTML parsing — insecure form detection
tldextract Accurate domain/subdomain/TLD isolation
dnspython · ssl · socket DNS resolution, TLS handshake, certificate retrieval
python-Levenshtein C-optimized edit distance — typosquatting detection
httpx / requests HTTP client — redirect tracing, User-Agent masquerading

🚀 Infrastructure & Tooling

Technology Role
Docker Containerization — encapsulates runtime and system-level dependencies like OpenSSL
Docker Compose Orchestrates frontend + backend + Redis as a unified stack
Git GitHub Version control & repository hosting
VS Code Primary development environment

📌 Use Cases

  • Phishing link detection
  • Unsafe website analysis
  • Educational research on web security and threat detection
  • Full-stack development practice with a security focus

📈 Project Status

Current Phase: Phase 1 – Static Analysis Engine (In Development)

Phase Description Status
Phase 1 Static Analysis Engine — 7-module URL inspection pipeline 🔄 In Development
Phase 2 Dynamic Analysis — full browser sandboxing & JS execution 🔜 Planned
Phase 3 MLOps Pipeline — automated model retraining on new threat data 🔜 Planned
Phase 4 Scale & Deploy — Kubernetes horizontal scaling, extended reporting 🔜 Planned

What's done in Phase 1:

  • ✅ Architecture fully designed and documented
  • ✅ All 7 analysis modules specified (normalization → heuristic rules → risk fusion)
  • ✅ Weighted Risk Scoring algorithm defined
  • ✅ Docker Compose full-stack setup
  • 🔄 Module implementation in progress

Coming next:

  • Phase 2 dynamic sandboxing (headless browser, JS execution, behavioral fingerprinting)
  • MLOps feedback loop for continuous model improvement
  • Kubernetes-based horizontal scaling for production workloads

▶️ How to Run SafeNav

SafeNav can be executed in two different modes depending on the use case:

  • Docker Mode – Recommended for demo, evaluation, and deployment
  • Development Mode – Recommended while coding and debugging

🐳 Running with Docker (Recommended)

This mode runs the frontend, backend, and database together using Docker Compose.

Prerequisites

  • Docker Desktop installed
  • Docker Compose enabled

Steps

  1. Clone the repository:
git clone https://github.com/su7ox/SafeNav.git
cd SafeNav
  1. Build and start all services:
docker-compose up -d
  1. Verify running containers:
docker ps

Access the Application

Stop the Application

docker-compose down

Apply Code Changes

Docker does not automatically reflect code changes.

docker-compose build
docker-compose up -d

🧑‍💻 Running in Development Mode (Without Docker)

This mode supports hot reload and is recommended during development.


🔹 Backend (FastAPI)

Prerequisites

  • Python 3.10 or higher

Steps

  1. Navigate to backend directory:
cd backend
  1. Create virtual environment (one-time):
python -m venv venv
  1. Activate virtual environment:
venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Run backend server:
uvicorn app.main:app --reload

Backend will be available at:


🔹 Frontend (React + Vite)

Prerequisites

  • Node.js (LTS version recommended)

Steps

  1. Open a new terminal and navigate to frontend directory:
cd frontend
  1. Install dependencies:
npm install
  1. Start frontend development server:
npm run dev

Frontend will be available at:

👤 Author

su7ox

GitHub: @su7ox

About

A URL/links checker tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors