Skip to content

zalkowitsch/linkedin-parser

Repository files navigation

@zalko/linkedin-parser

npm version downloads coverage bundle size node version typescript license

A clean, lightweight TypeScript library for parsing LinkedIn PDF resumes and extracting structured profile data.

ℹ️ Note: This is a newly published package. Download statistics may take 24-48 hours to populate. Some badges show "package not found or too new" until npm statistics are updated.

tests activity last commit

Installation β€’ CLI Usage β€’ Quick Start β€’ API Reference β€’ Examples


✨ Features

πŸš€ Simple API
Single function to parse PDF files or text
πŸ“¦ Lightweight
Only 1 dependency (pdf-parse)
πŸ”§ TypeScript First
Full type definitions included
⚑ Fast
Optimized parsing algorithms
πŸ§ͺ Well Tested
Comprehensive Jest test suite
πŸ“± ESM Ready
Modern ES module support

πŸ“¦ Installation

Library Usage

npm install @zalko/linkedin-parser

CLI Usage (Global)

# Install globally for command-line usage
npm install -g @zalko/linkedin-parser

# Or use with npx (no installation required)
npx @zalko/linkedin-parser path/to/resume.pdf

πŸ–₯️ CLI Usage

The package includes a command-line interface for easy PDF processing:

Basic Usage

# Parse a LinkedIn PDF and output JSON
linkedin-pdf-parser ./resume.pdf

# Save output to file
linkedin-pdf-parser ./resume.pdf > profile.json

# Compact output (no pretty formatting)
linkedin-pdf-parser ./resume.pdf --compact

# Include raw extracted text
linkedin-pdf-parser ./resume.pdf --raw-text

Real-world Examples

# Process multiple PDFs
for pdf in *.pdf; do
  linkedin-pdf-parser "$pdf" > "${pdf%.pdf}.json"
done

# Extract specific data with jq
linkedin-pdf-parser resume.pdf | jq '.profile.name'
linkedin-pdf-parser resume.pdf | jq '.profile.contact.email'
linkedin-pdf-parser resume.pdf | jq '.profile.experience[].company'

CLI Options

  • --compact - Compact JSON output (no formatting)
  • --raw-text - Include raw extracted text in output
  • --help, -h - Show help message

πŸ“– See CLI_USAGE.md for complete CLI documentation

Note: Starting from v1.0.2, pdf-parse is a peer dependency to minimize bundle size.

πŸš€ Quick Start

import { parseLinkedInPDF } from '@zalko/linkedin-parser';
import fs from 'fs';

// Parse from PDF Buffer
const pdfBuffer = fs.readFileSync('resume.pdf');
const result = await parseLinkedInPDF(pdfBuffer);

console.log(result.profile.name);          // "John Silva"
console.log(result.profile.contact.email); // "john.silva@email.com"
console.log(result.profile.experience);    // [{ title: "...", company: "..." }]

πŸ“š Examples

Basic Usage

import { parseLinkedInPDF } from '@zalko/linkedin-parser';

const pdfBuffer = fs.readFileSync('linkedin-resume.pdf');
const { profile } = await parseLinkedInPDF(pdfBuffer);

// Access parsed data
console.log(`Name: ${profile.name}`);
console.log(`Email: ${profile.contact.email}`);
console.log(`Skills: ${profile.top_skills.join(', ')}`);
console.log(`Experience: ${profile.experience.length} positions`);

With Options

// Include raw extracted text in result
const result = await parseLinkedInPDF(pdfBuffer, {
  includeRawText: true
});

console.log(`Raw text: ${result.rawText?.substring(0, 100)}...`);

Parse Text Directly

// If you already have extracted text from PDF
const extractedText = "John Silva\nSoftware Engineer...";
const result = await parseLinkedInPDF(extractedText);

Error Handling

try {
  const result = await parseLinkedInPDF(pdfBuffer);
  console.log(result.profile);
} catch (error) {
  if (error.message === 'PDF appears to be empty or unreadable') {
    console.error('Invalid PDF file');
  } else {
    console.error('Parsing failed:', error.message);
  }
}

πŸ“– API Reference

parseLinkedInPDF(input, options?)

Parses a LinkedIn PDF resume and extracts structured profile data.

Parameters

Parameter Type Description
input Buffer | string PDF Buffer or extracted text string
options? ParseOptions Optional parsing configuration

Returns

Promise<ParseResult> - Promise resolving to parsed profile data

Example

const result = await parseLinkedInPDF(pdfBuffer, { includeRawText: true });

πŸ—οΈ TypeScript Interfaces

LinkedInProfile
interface LinkedInProfile {
  name: string;
  headline: string;
  location: string;
  contact: Contact;
  top_skills: string[];
  languages: Language[];
  summary?: string;
  experience: Experience[];
  education: Education[];
}
Contact
interface Contact {
  email: string;
  phone?: string;
  linkedin_url?: string;
  location?: string;
}
Experience
interface Experience {
  title: string;
  company: string;
  duration: string;
  location?: string;
  description?: string;
}

Work Experience Structure:

  • Work Experience: A continuous period of employment at an organization, even if the person returns to the same company later after working elsewhere
  • Organization/Company: The employer entity (e.g., "TechCorp", "DataSystems Inc")
  • Position/Role: The job title/role within that work experience period (e.g., "Engineering Manager", "Senior Developer")

Examples:

Single organization, multiple positions:

TechCorp (1 work experience, 3 positions):
- Engineering Manager
- Senior Developer
- Software Developer

Same organization, separate work experiences:

DataSystems Inc (2 separate work experiences, 2 positions):
1st work experience: Lead Engineer (2018-2020)
2nd work experience: Technical Architect (2023-Present)
// Note: Person worked elsewhere between 2020-2023

Key principle: If someone returns to the same company after working elsewhere, it counts as a separate work experience. This reflects career progression and different employment periods.

Education
interface Education {
  degree: string;
  institution: string;
  year?: string;
  location?: string;
  description?: string;
}
Language
interface Language {
  language: string;
  proficiency: string;
}
ParseOptions
interface ParseOptions {
  includeRawText?: boolean;
}
ParseResult
interface ParseResult {
  profile: LinkedInProfile;
  rawText?: string;
}

πŸ› οΈ Development

# Clone repository
git clone https://github.com/zalkowitsch/linkedin-parser.git
cd linkedin-parser

# Install dependencies
npm install

# Run tests
npm test

# Build library
npm run build

# Run tests with coverage
npm run test:coverage

# Clean build artifacts
npm run clean

πŸ“Š Performance

  • Processing time: ~70ms average for typical LinkedIn PDF
  • Memory usage: Minimal memory footprint (~8MB)
  • Bundle size: Ultra-lightweight at 3.0kB gzipped

πŸ›‘οΈ Quality & Trust

πŸ§ͺ Test Coverage
95.6% code coverage with comprehensive test suite
πŸ”’ Security
Zero known vulnerabilities, regularly audited
πŸ“ˆ CI/CD
Automated testing and deployment pipeline
🏷️ Semantic Versioning
Follows semver for predictable releases
πŸ“ Documentation
Comprehensive docs with TypeScript support
πŸš€ Production Ready
Battle-tested in production environments

🌍 Compatibility

Node.js TypeScript ES2022

Supported Environments:

  • βœ… Node.js 18+ (ES2022 support)
  • βœ… TypeScript 5.0+
  • βœ… ESM (ES Modules)
  • βœ… CommonJS (via build)
  • βœ… Browsers (via bundlers)

Package Managers:

  • βœ… npm 8+
  • βœ… yarn 1.22+
  • βœ… pnpm 7+

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

πŸ“„ License

MIT Β© Arkady Zalkowitsch


⭐ Star this project if you find it helpful!

Made with ❀️ by Arkady Zalkowitsch

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •