System Architecture - TalentSync AI

System Overview

TalentSync AI is a multi-tenant, AI-powered talent matching platform that automates the process of matching consultants with job opportunities. The system uses advanced NLP and machine learning to analyze CVs, job descriptions, and generate personalized application materials.

Core Features

Automated job crawling from multiple providers
AI-powered CV analysis and matching
Multi-language support (Swedish & English)
Automated motivation letter generation
Real-time job bank with advanced filtering
Multi-tenant architecture with organization isolation
Comprehensive admin dashboard with analytics

High-Level Architecture

graph TB subgraph "Frontend Layer" UI[Web UI - React/Vanilla JS] Admin[Admin Dashboard] Mobile[Mobile Responsive] end subgraph "API Gateway" REST[REST API - Express.js] Auth[Authentication - JWT] Rate[Rate Limiting] end subgraph "Application Layer" Match[Matching Engine] Crawl[Job Crawlers] AI[AI Processor] PDF[PDF Generator] Email[Email Service] end subgraph "Data Layer" PG[(PostgreSQL)] Redis[(Redis Cache)] FS[File System] end subgraph "External Services" LLM[LLM Providers] SMTP[Email Provider] Job[Job Websites] end UI --> REST Admin --> REST Mobile --> REST REST --> Auth REST --> Rate Auth --> Match Auth --> Crawl Auth --> AI Auth --> PDF Auth --> Email Match --> PG Match --> Redis Match --> AI Crawl --> PG Crawl --> Job AI --> LLM AI --> PG PDF --> FS Email --> SMTP

Technology Stack

Frontend

HTML5, CSS3, JavaScript (ES6+)
Chart.js for analytics
Font Awesome icons
Responsive design

Backend

Node.js (v18+)
Express.js framework
JWT authentication
Bcrypt for passwords

Database

PostgreSQL 14+
Redis for caching
File system storage

AI/ML

OpenRouter API
Ollama (local LLM)
GPT-4 support
Custom prompting

Infrastructure

PM2 process manager
Nginx reverse proxy
Cloudflare tunnel
Docker ready

Monitoring

Custom analytics
Real-time dashboards
Error tracking
Performance metrics

API Documentation

Interactive API Documentation: Visit /api-docs for Swagger UI with live API testing capabilities.

Authentication APIs

Endpoint	Method	Description	Auth Required
/api/auth/login	POST	User login with username/password	No
/api/auth/logout	POST	Logout and invalidate token	Yes
/api/auth/check	GET	Verify token validity	Yes
/api/auth/register	POST	Register new user (admin only)	Admin

Job Bank APIs

Endpoint	Method	Description	Parameters
/api/job-bank/jobs	GET	Get paginated job listings	page, limit, city, provider, search
/api/job-bank/stats	GET	Get job bank statistics	None
/api/job-bank/cities	GET	Get list of available cities	None
/api/job-bank/providers	GET	Get list of job providers	None

Crawler APIs

Endpoint	Method	Description	Auth Required
/api/crawler/status	GET	Get crawler scheduler status	Admin
/api/crawler/run/:provider	POST	Run specific crawler manually	Admin
/api/crawler/config/:provider	PUT	Update crawler configuration	Admin
/api/crawler/scheduler/start	POST	Start crawler scheduler	Admin

Matching APIs

Endpoint	Method	Description	Request Body
/api/guided/submit	POST	Submit CV for AI matching	cv_text, job details, language preferences
/api/cv-matching/analyze	POST	Analyze CV against job	cv_content, job_description
/api/job/match	POST	Get job recommendations	cv_id, filters

Data Flows

Job Crawling Flow

sequenceDiagram participant Scheduler participant Crawler participant Website participant Parser participant Database participant Cache Scheduler->>Crawler: Trigger crawl (cron/manual) Crawler->>Website: Fetch job listings Website-->>Crawler: HTML content Crawler->>Parser: Parse HTML Parser->>Parser: Extract job data Parser->>Database: Check duplicates Database-->>Parser: Existing jobs Parser->>Database: Insert new jobs Parser->>Cache: Update cache Crawler->>Scheduler: Report completion

CV Matching Flow

1
CV Upload
User uploads CV file (PDF/TXT/DOCX)
2
Text Extraction
Extract text content using file parsers
3
AI Analysis
Send to LLM for skill extraction and analysis
4
Job Matching
Compare skills with job requirements
5
Score Calculation
Calculate match percentage and ranking
6
Report Generation
Generate PDF report with recommendations

Database Schema

Core Tables

erDiagram USERS ||--o{ ACTIVITY_LOGS : generates USERS ||--o{ CV_UPLOADS : uploads USERS }o--|| ORGANIZATIONS : belongs_to ORGANIZATIONS ||--o{ USERS : has JOBS ||--o{ MATCHES : matched_to CV_UPLOADS ||--o{ MATCHES : matches USERS { int user_id PK string username UK string email string password_hash string role int organization_id FK timestamp created_at } ORGANIZATIONS { int organization_id PK string name string type json settings timestamp created_at } JOBS { int job_id PK string job_url UK string title text description string location string provider json metadata timestamp crawled_at } CV_UPLOADS { int cv_id PK int user_id FK string filename text content json extracted_data timestamp uploaded_at } MATCHES { int match_id PK int cv_id FK int job_id FK float match_score json analysis timestamp created_at }

Key Indexes

-- Performance indexes
CREATE INDEX idx_jobs_crawled_at ON jobs(crawled_at DESC);
CREATE INDEX idx_jobs_provider ON jobs(provider);
CREATE INDEX idx_jobs_location ON jobs(location);
CREATE INDEX idx_matches_score ON matches(match_score DESC);
CREATE INDEX idx_users_org ON users(organization_id);

-- Unique constraints
ALTER TABLE users ADD CONSTRAINT uk_username UNIQUE (username);
ALTER TABLE jobs ADD CONSTRAINT uk_job_url UNIQUE (job_url);
            

Security Architecture

Authentication & Authorization

JWT Tokens: Stateless authentication with RS256 signing
Role-Based Access: super_admin, admin, consultant_manager, consultant
Multi-tenant Isolation: Row-level security based on organization_id
Password Security: Bcrypt with salt rounds = 10

API Security

Rate Limiting: Configurable per endpoint
CORS Policy: Whitelist allowed origins
Input Validation: Sanitize all user inputs
SQL Injection Prevention: Parameterized queries

Data Protection

Encryption at Rest: PostgreSQL TDE
Encryption in Transit: HTTPS/TLS 1.3
PII Handling: Masked in logs, encrypted in DB
GDPR Compliance: Data retention policies

Deployment Architecture

Production Environment

graph LR subgraph "Internet" Users[Users] CF[Cloudflare] end subgraph "Server" Nginx[Nginx] PM2[PM2 Cluster] Node1[Node Process 1] Node2[Node Process 2] Node3[Node Process 3] Node4[Node Process 4] Crawler[Crawler Scheduler] end subgraph "Data" PG[(PostgreSQL)] Redis[(Redis)] FS[File Storage] end Users --> CF CF --> Nginx Nginx --> PM2 PM2 --> Node1 PM2 --> Node2 PM2 --> Node3 PM2 --> Node4 Node1 --> PG Node2 --> PG Node3 --> PG Node4 --> PG Node1 --> Redis Node2 --> Redis Crawler --> PG Node1 --> FS

PM2 Configuration

module.exports = {
  apps: [
    {
      name: 'rfp-server',
      script: 'server.js',
      instances: 4,
      exec_mode: 'cluster',
      env: {
        NODE_ENV: 'production',
        PORT: 3000
      }
    },
    {
      name: 'job-crawler-scheduler',
      script: 'job_crawler_scheduler.js',
      instances: 1,
      exec_mode: 'fork'
    }
  ]
}
            

Monitoring & Analytics

System Metrics

Application Metrics: Response times, error rates, throughput
Infrastructure Metrics: CPU, memory, disk usage
Business Metrics: Jobs crawled, matches created, user activity
LLM Metrics: Token usage, costs, response times

Monitoring Stack

Real-time Monitoring

Admin Dashboard
WebSocket updates
Live crawler logs

Analytics

Job statistics
Match analytics
User behavior

Alerting

Error thresholds
Performance alerts
Crawler failures

Development Workflow

Getting Started

# Clone repository
git clone https://github.com/company/talentsync.git

# Install dependencies
npm install

# Setup database
psql -U postgres -f database/schema.sql

# Configure environment
cp .env.example .env
# Edit .env with your settings

# Start development server
npm run dev

# Start with PM2
pm2 start ecosystem.config.js
            

Development Commands

# Start services
./start.sh                    # Start all services
pm2 start ecosystem.config.js # Production mode

# Database
npm run db:migrate           # Run migrations
npm run db:seed             # Seed test data

# Testing
npm test                    # Run tests
npm run test:e2e           # E2E tests

# Crawlers
node crawl_all_providers.js  # Crawl all providers
node job_crawler_scheduler.js # Start scheduler
            

Troubleshooting Guide

Common Issues

Issue	Symptoms	Solution
Database Connection	ECONNREFUSED errors	Check PostgreSQL service, verify credentials in config.json
Crawler Not Running	No new jobs appearing	Check PM2 status, verify crawler_config.json, check logs
Authentication Errors	401/403 responses	Clear localStorage, check JWT expiry, verify user roles
LLM Errors	AI analysis fails	Check API keys, verify provider status, check rate limits

Debug Commands

# Check system status
pm2 status
pm2 logs

# Database queries
psql -U rfpuser -d rfp_ai_matcher
SELECT COUNT(*) FROM jobs WHERE DATE(crawled_at) = CURRENT_DATE;

# Check crawler logs
tail -f crawler_schedule.log

# Test API endpoints
curl http://localhost:3000/api/health
curl http://localhost:3000/api/job-bank/stats
            

TalentSync AI System Architecture