TalentSync AI System Architecture
Comprehensive Technical Documentation & System Design
System Overview
TalentSync AI is a multi-tenant, AI-powered talent matching platform that automates the process of matching consultants with job opportunities. The system uses advanced NLP and machine learning to analyze CVs, job descriptions, and generate personalized application materials.
Core Features
- Automated job crawling from multiple providers
- AI-powered CV analysis and matching
- Multi-language support (Swedish & English)
- Automated motivation letter generation
- Real-time job bank with advanced filtering
- Multi-tenant architecture with organization isolation
- Comprehensive admin dashboard with analytics
High-Level Architecture
graph TB
subgraph "Frontend Layer"
UI[Web UI - React/Vanilla JS]
Admin[Admin Dashboard]
Mobile[Mobile Responsive]
end
subgraph "API Gateway"
REST[REST API - Express.js]
Auth[Authentication - JWT]
Rate[Rate Limiting]
end
subgraph "Application Layer"
Match[Matching Engine]
Crawl[Job Crawlers]
AI[AI Processor]
PDF[PDF Generator]
Email[Email Service]
end
subgraph "Data Layer"
PG[(PostgreSQL)]
Redis[(Redis Cache)]
FS[File System]
end
subgraph "External Services"
LLM[LLM Providers]
SMTP[Email Provider]
Job[Job Websites]
end
UI --> REST
Admin --> REST
Mobile --> REST
REST --> Auth
REST --> Rate
Auth --> Match
Auth --> Crawl
Auth --> AI
Auth --> PDF
Auth --> Email
Match --> PG
Match --> Redis
Match --> AI
Crawl --> PG
Crawl --> Job
AI --> LLM
AI --> PG
PDF --> FS
Email --> SMTP
Technology Stack
Frontend
- HTML5, CSS3, JavaScript (ES6+)
- Chart.js for analytics
- Font Awesome icons
- Responsive design
Backend
- Node.js (v18+)
- Express.js framework
- JWT authentication
- Bcrypt for passwords
Database
- PostgreSQL 14+
- Redis for caching
- File system storage
AI/ML
- OpenRouter API
- Ollama (local LLM)
- GPT-4 support
- Custom prompting
Infrastructure
- PM2 process manager
- Nginx reverse proxy
- Cloudflare tunnel
- Docker ready
Monitoring
- Custom analytics
- Real-time dashboards
- Error tracking
- Performance metrics
API Documentation
Interactive API Documentation: Visit /api-docs for Swagger UI with live API testing capabilities.
Authentication APIs
Endpoint | Method | Description | Auth Required |
---|---|---|---|
/api/auth/login | POST | User login with username/password | No |
/api/auth/logout | POST | Logout and invalidate token | Yes |
/api/auth/check | GET | Verify token validity | Yes |
/api/auth/register | POST | Register new user (admin only) | Admin |
Job Bank APIs
Endpoint | Method | Description | Parameters |
---|---|---|---|
/api/job-bank/jobs | GET | Get paginated job listings | page, limit, city, provider, search |
/api/job-bank/stats | GET | Get job bank statistics | None |
/api/job-bank/cities | GET | Get list of available cities | None |
/api/job-bank/providers | GET | Get list of job providers | None |
Crawler APIs
Endpoint | Method | Description | Auth Required |
---|---|---|---|
/api/crawler/status | GET | Get crawler scheduler status | Admin |
/api/crawler/run/:provider | POST | Run specific crawler manually | Admin |
/api/crawler/config/:provider | PUT | Update crawler configuration | Admin |
/api/crawler/scheduler/start | POST | Start crawler scheduler | Admin |
Matching APIs
Endpoint | Method | Description | Request Body |
---|---|---|---|
/api/guided/submit | POST | Submit CV for AI matching | cv_text, job details, language preferences |
/api/cv-matching/analyze | POST | Analyze CV against job | cv_content, job_description |
/api/job/match | POST | Get job recommendations | cv_id, filters |
Data Flows
Job Crawling Flow
sequenceDiagram
participant Scheduler
participant Crawler
participant Website
participant Parser
participant Database
participant Cache
Scheduler->>Crawler: Trigger crawl (cron/manual)
Crawler->>Website: Fetch job listings
Website-->>Crawler: HTML content
Crawler->>Parser: Parse HTML
Parser->>Parser: Extract job data
Parser->>Database: Check duplicates
Database-->>Parser: Existing jobs
Parser->>Database: Insert new jobs
Parser->>Cache: Update cache
Crawler->>Scheduler: Report completion
CV Matching Flow
-
1
CV Upload
User uploads CV file (PDF/TXT/DOCX) -
2
Text Extraction
Extract text content using file parsers -
3
AI Analysis
Send to LLM for skill extraction and analysis -
4
Job Matching
Compare skills with job requirements -
5
Score Calculation
Calculate match percentage and ranking -
6
Report Generation
Generate PDF report with recommendations
Database Schema
Core Tables
erDiagram
USERS ||--o{ ACTIVITY_LOGS : generates
USERS ||--o{ CV_UPLOADS : uploads
USERS }o--|| ORGANIZATIONS : belongs_to
ORGANIZATIONS ||--o{ USERS : has
JOBS ||--o{ MATCHES : matched_to
CV_UPLOADS ||--o{ MATCHES : matches
USERS {
int user_id PK
string username UK
string email
string password_hash
string role
int organization_id FK
timestamp created_at
}
ORGANIZATIONS {
int organization_id PK
string name
string type
json settings
timestamp created_at
}
JOBS {
int job_id PK
string job_url UK
string title
text description
string location
string provider
json metadata
timestamp crawled_at
}
CV_UPLOADS {
int cv_id PK
int user_id FK
string filename
text content
json extracted_data
timestamp uploaded_at
}
MATCHES {
int match_id PK
int cv_id FK
int job_id FK
float match_score
json analysis
timestamp created_at
}
Key Indexes
-- Performance indexes
CREATE INDEX idx_jobs_crawled_at ON jobs(crawled_at DESC);
CREATE INDEX idx_jobs_provider ON jobs(provider);
CREATE INDEX idx_jobs_location ON jobs(location);
CREATE INDEX idx_matches_score ON matches(match_score DESC);
CREATE INDEX idx_users_org ON users(organization_id);
-- Unique constraints
ALTER TABLE users ADD CONSTRAINT uk_username UNIQUE (username);
ALTER TABLE jobs ADD CONSTRAINT uk_job_url UNIQUE (job_url);
Security Architecture
Authentication & Authorization
- JWT Tokens: Stateless authentication with RS256 signing
- Role-Based Access: super_admin, admin, consultant_manager, consultant
- Multi-tenant Isolation: Row-level security based on organization_id
- Password Security: Bcrypt with salt rounds = 10
API Security
- Rate Limiting: Configurable per endpoint
- CORS Policy: Whitelist allowed origins
- Input Validation: Sanitize all user inputs
- SQL Injection Prevention: Parameterized queries
Data Protection
- Encryption at Rest: PostgreSQL TDE
- Encryption in Transit: HTTPS/TLS 1.3
- PII Handling: Masked in logs, encrypted in DB
- GDPR Compliance: Data retention policies
Deployment Architecture
Production Environment
graph LR
subgraph "Internet"
Users[Users]
CF[Cloudflare]
end
subgraph "Server"
Nginx[Nginx]
PM2[PM2 Cluster]
Node1[Node Process 1]
Node2[Node Process 2]
Node3[Node Process 3]
Node4[Node Process 4]
Crawler[Crawler Scheduler]
end
subgraph "Data"
PG[(PostgreSQL)]
Redis[(Redis)]
FS[File Storage]
end
Users --> CF
CF --> Nginx
Nginx --> PM2
PM2 --> Node1
PM2 --> Node2
PM2 --> Node3
PM2 --> Node4
Node1 --> PG
Node2 --> PG
Node3 --> PG
Node4 --> PG
Node1 --> Redis
Node2 --> Redis
Crawler --> PG
Node1 --> FS
PM2 Configuration
module.exports = {
apps: [
{
name: 'rfp-server',
script: 'server.js',
instances: 4,
exec_mode: 'cluster',
env: {
NODE_ENV: 'production',
PORT: 3000
}
},
{
name: 'job-crawler-scheduler',
script: 'job_crawler_scheduler.js',
instances: 1,
exec_mode: 'fork'
}
]
}
Monitoring & Analytics
System Metrics
- Application Metrics: Response times, error rates, throughput
- Infrastructure Metrics: CPU, memory, disk usage
- Business Metrics: Jobs crawled, matches created, user activity
- LLM Metrics: Token usage, costs, response times
Monitoring Stack
Real-time Monitoring
- Admin Dashboard
- WebSocket updates
- Live crawler logs
Analytics
- Job statistics
- Match analytics
- User behavior
Alerting
- Error thresholds
- Performance alerts
- Crawler failures
Development Workflow
Getting Started
# Clone repository
git clone https://github.com/company/talentsync.git
# Install dependencies
npm install
# Setup database
psql -U postgres -f database/schema.sql
# Configure environment
cp .env.example .env
# Edit .env with your settings
# Start development server
npm run dev
# Start with PM2
pm2 start ecosystem.config.js
Development Commands
# Start services
./start.sh # Start all services
pm2 start ecosystem.config.js # Production mode
# Database
npm run db:migrate # Run migrations
npm run db:seed # Seed test data
# Testing
npm test # Run tests
npm run test:e2e # E2E tests
# Crawlers
node crawl_all_providers.js # Crawl all providers
node job_crawler_scheduler.js # Start scheduler
Troubleshooting Guide
Common Issues
Issue | Symptoms | Solution |
---|---|---|
Database Connection | ECONNREFUSED errors | Check PostgreSQL service, verify credentials in config.json |
Crawler Not Running | No new jobs appearing | Check PM2 status, verify crawler_config.json, check logs |
Authentication Errors | 401/403 responses | Clear localStorage, check JWT expiry, verify user roles |
LLM Errors | AI analysis fails | Check API keys, verify provider status, check rate limits |
Debug Commands
# Check system status
pm2 status
pm2 logs
# Database queries
psql -U rfpuser -d rfp_ai_matcher
SELECT COUNT(*) FROM jobs WHERE DATE(crawled_at) = CURRENT_DATE;
# Check crawler logs
tail -f crawler_schedule.log
# Test API endpoints
curl http://localhost:3000/api/health
curl http://localhost:3000/api/job-bank/stats