TalentSync AI System Architecture

Comprehensive Technical Documentation & System Design

System Overview

TalentSync AI is a multi-tenant, AI-powered talent matching platform that automates the process of matching consultants with job opportunities. The system uses advanced NLP and machine learning to analyze CVs, job descriptions, and generate personalized application materials.

Core Features

  • Automated job crawling from multiple providers
  • AI-powered CV analysis and matching
  • Multi-language support (Swedish & English)
  • Automated motivation letter generation
  • Real-time job bank with advanced filtering
  • Multi-tenant architecture with organization isolation
  • Comprehensive admin dashboard with analytics

High-Level Architecture

graph TB subgraph "Frontend Layer" UI[Web UI - React/Vanilla JS] Admin[Admin Dashboard] Mobile[Mobile Responsive] end subgraph "API Gateway" REST[REST API - Express.js] Auth[Authentication - JWT] Rate[Rate Limiting] end subgraph "Application Layer" Match[Matching Engine] Crawl[Job Crawlers] AI[AI Processor] PDF[PDF Generator] Email[Email Service] end subgraph "Data Layer" PG[(PostgreSQL)] Redis[(Redis Cache)] FS[File System] end subgraph "External Services" LLM[LLM Providers] SMTP[Email Provider] Job[Job Websites] end UI --> REST Admin --> REST Mobile --> REST REST --> Auth REST --> Rate Auth --> Match Auth --> Crawl Auth --> AI Auth --> PDF Auth --> Email Match --> PG Match --> Redis Match --> AI Crawl --> PG Crawl --> Job AI --> LLM AI --> PG PDF --> FS Email --> SMTP

Technology Stack

Frontend

  • HTML5, CSS3, JavaScript (ES6+)
  • Chart.js for analytics
  • Font Awesome icons
  • Responsive design

Backend

  • Node.js (v18+)
  • Express.js framework
  • JWT authentication
  • Bcrypt for passwords

Database

  • PostgreSQL 14+
  • Redis for caching
  • File system storage

AI/ML

  • OpenRouter API
  • Ollama (local LLM)
  • GPT-4 support
  • Custom prompting

Infrastructure

  • PM2 process manager
  • Nginx reverse proxy
  • Cloudflare tunnel
  • Docker ready

Monitoring

  • Custom analytics
  • Real-time dashboards
  • Error tracking
  • Performance metrics

API Documentation

Interactive API Documentation: Visit /api-docs for Swagger UI with live API testing capabilities.

Authentication APIs

Endpoint Method Description Auth Required
/api/auth/login POST User login with username/password No
/api/auth/logout POST Logout and invalidate token Yes
/api/auth/check GET Verify token validity Yes
/api/auth/register POST Register new user (admin only) Admin

Job Bank APIs

Endpoint Method Description Parameters
/api/job-bank/jobs GET Get paginated job listings page, limit, city, provider, search
/api/job-bank/stats GET Get job bank statistics None
/api/job-bank/cities GET Get list of available cities None
/api/job-bank/providers GET Get list of job providers None

Crawler APIs

Endpoint Method Description Auth Required
/api/crawler/status GET Get crawler scheduler status Admin
/api/crawler/run/:provider POST Run specific crawler manually Admin
/api/crawler/config/:provider PUT Update crawler configuration Admin
/api/crawler/scheduler/start POST Start crawler scheduler Admin

Matching APIs

Endpoint Method Description Request Body
/api/guided/submit POST Submit CV for AI matching cv_text, job details, language preferences
/api/cv-matching/analyze POST Analyze CV against job cv_content, job_description
/api/job/match POST Get job recommendations cv_id, filters

Data Flows

Job Crawling Flow

sequenceDiagram participant Scheduler participant Crawler participant Website participant Parser participant Database participant Cache Scheduler->>Crawler: Trigger crawl (cron/manual) Crawler->>Website: Fetch job listings Website-->>Crawler: HTML content Crawler->>Parser: Parse HTML Parser->>Parser: Extract job data Parser->>Database: Check duplicates Database-->>Parser: Existing jobs Parser->>Database: Insert new jobs Parser->>Cache: Update cache Crawler->>Scheduler: Report completion

CV Matching Flow

  1. 1
    CV Upload
    User uploads CV file (PDF/TXT/DOCX)
  2. 2
    Text Extraction
    Extract text content using file parsers
  3. 3
    AI Analysis
    Send to LLM for skill extraction and analysis
  4. 4
    Job Matching
    Compare skills with job requirements
  5. 5
    Score Calculation
    Calculate match percentage and ranking
  6. 6
    Report Generation
    Generate PDF report with recommendations

Database Schema

Core Tables

erDiagram USERS ||--o{ ACTIVITY_LOGS : generates USERS ||--o{ CV_UPLOADS : uploads USERS }o--|| ORGANIZATIONS : belongs_to ORGANIZATIONS ||--o{ USERS : has JOBS ||--o{ MATCHES : matched_to CV_UPLOADS ||--o{ MATCHES : matches USERS { int user_id PK string username UK string email string password_hash string role int organization_id FK timestamp created_at } ORGANIZATIONS { int organization_id PK string name string type json settings timestamp created_at } JOBS { int job_id PK string job_url UK string title text description string location string provider json metadata timestamp crawled_at } CV_UPLOADS { int cv_id PK int user_id FK string filename text content json extracted_data timestamp uploaded_at } MATCHES { int match_id PK int cv_id FK int job_id FK float match_score json analysis timestamp created_at }

Key Indexes

-- Performance indexes CREATE INDEX idx_jobs_crawled_at ON jobs(crawled_at DESC); CREATE INDEX idx_jobs_provider ON jobs(provider); CREATE INDEX idx_jobs_location ON jobs(location); CREATE INDEX idx_matches_score ON matches(match_score DESC); CREATE INDEX idx_users_org ON users(organization_id); -- Unique constraints ALTER TABLE users ADD CONSTRAINT uk_username UNIQUE (username); ALTER TABLE jobs ADD CONSTRAINT uk_job_url UNIQUE (job_url);

Security Architecture

Authentication & Authorization

  • JWT Tokens: Stateless authentication with RS256 signing
  • Role-Based Access: super_admin, admin, consultant_manager, consultant
  • Multi-tenant Isolation: Row-level security based on organization_id
  • Password Security: Bcrypt with salt rounds = 10

API Security

  • Rate Limiting: Configurable per endpoint
  • CORS Policy: Whitelist allowed origins
  • Input Validation: Sanitize all user inputs
  • SQL Injection Prevention: Parameterized queries

Data Protection

  • Encryption at Rest: PostgreSQL TDE
  • Encryption in Transit: HTTPS/TLS 1.3
  • PII Handling: Masked in logs, encrypted in DB
  • GDPR Compliance: Data retention policies

Deployment Architecture

Production Environment

graph LR subgraph "Internet" Users[Users] CF[Cloudflare] end subgraph "Server" Nginx[Nginx] PM2[PM2 Cluster] Node1[Node Process 1] Node2[Node Process 2] Node3[Node Process 3] Node4[Node Process 4] Crawler[Crawler Scheduler] end subgraph "Data" PG[(PostgreSQL)] Redis[(Redis)] FS[File Storage] end Users --> CF CF --> Nginx Nginx --> PM2 PM2 --> Node1 PM2 --> Node2 PM2 --> Node3 PM2 --> Node4 Node1 --> PG Node2 --> PG Node3 --> PG Node4 --> PG Node1 --> Redis Node2 --> Redis Crawler --> PG Node1 --> FS

PM2 Configuration

module.exports = { apps: [ { name: 'rfp-server', script: 'server.js', instances: 4, exec_mode: 'cluster', env: { NODE_ENV: 'production', PORT: 3000 } }, { name: 'job-crawler-scheduler', script: 'job_crawler_scheduler.js', instances: 1, exec_mode: 'fork' } ] }

Monitoring & Analytics

System Metrics

  • Application Metrics: Response times, error rates, throughput
  • Infrastructure Metrics: CPU, memory, disk usage
  • Business Metrics: Jobs crawled, matches created, user activity
  • LLM Metrics: Token usage, costs, response times

Monitoring Stack

Real-time Monitoring

  • Admin Dashboard
  • WebSocket updates
  • Live crawler logs

Analytics

  • Job statistics
  • Match analytics
  • User behavior

Alerting

  • Error thresholds
  • Performance alerts
  • Crawler failures

Development Workflow

Getting Started

# Clone repository git clone https://github.com/company/talentsync.git # Install dependencies npm install # Setup database psql -U postgres -f database/schema.sql # Configure environment cp .env.example .env # Edit .env with your settings # Start development server npm run dev # Start with PM2 pm2 start ecosystem.config.js

Development Commands

# Start services ./start.sh # Start all services pm2 start ecosystem.config.js # Production mode # Database npm run db:migrate # Run migrations npm run db:seed # Seed test data # Testing npm test # Run tests npm run test:e2e # E2E tests # Crawlers node crawl_all_providers.js # Crawl all providers node job_crawler_scheduler.js # Start scheduler

Troubleshooting Guide

Common Issues

Issue Symptoms Solution
Database Connection ECONNREFUSED errors Check PostgreSQL service, verify credentials in config.json
Crawler Not Running No new jobs appearing Check PM2 status, verify crawler_config.json, check logs
Authentication Errors 401/403 responses Clear localStorage, check JWT expiry, verify user roles
LLM Errors AI analysis fails Check API keys, verify provider status, check rate limits

Debug Commands

# Check system status pm2 status pm2 logs # Database queries psql -U rfpuser -d rfp_ai_matcher SELECT COUNT(*) FROM jobs WHERE DATE(crawled_at) = CURRENT_DATE; # Check crawler logs tail -f crawler_schedule.log # Test API endpoints curl http://localhost:3000/api/health curl http://localhost:3000/api/job-bank/stats