Projects / Job Scanner

Job Scanner: LLM-Powered Job Screening

An automated pipeline that uses GPT-4 for role evaluation with deterministic gates for dealbreaker detection.

Why This Exists

Job searching at scale is a screening problem. Reviewing 50+ job descriptions daily, evaluating fit against my profile, and generating personalized application materials is tedious and error-prone. I built this pipeline to automate the 80% of decisions that follow clear patterns.

The core insight: LLMs excel at nuanced role fit scoring, but they hallucinate on binary constraints. "Is this role remote?" should never be a probabilistic answer. The solution is a hybrid architecture — LLMs for judgment calls, deterministic logic for dealbreakers.

Design Goals

Control.

I can override any decision. The system advises; I apply. There is no auto-apply. The pipeline generates materials; I submit them.

Auditability.

Every decision has a traceable path — prompt version, profile version, input hash, decision reasons. Weeks later, I can reconstruct exactly why a role was skipped.

Reliability.

Deterministic gates catch what LLMs miss. Contract terms and location requirements are pattern-matched, not inferred. Binary constraints get binary logic.

High-Level Architecture

The pipeline runs in two stages with clear separation of concerns:

STAGE 1: SCRAPE & EVALUATE
┌─────────────┐     ┌─────────────┐     ┌─────────────────────────┐
│  Scraper    │────▶│  Evaluator  │────▶│   Post-Evaluation       │
│ (Playwright)│     │  (GPT-4o)   │     │   Gates (Regex)         │
│             │     │             │     │                         │
│ • LinkedIn  │     │ • Score 1-10│     │ • Contract terms        │
│ • Dedupe    │     │ • Classify  │     │ • Onsite requirements   │
│ • Paginate  │     │ • Risk      │     │ • Role mismatch         │
└─────────────┘     └─────────────┘     │ • Staffing firm detect  │
                                        │ • APPLY cap (20%)       │
                                        └─────────────────────────┘

STAGE 2: GENERATE WRITING
┌─────────────────────┐     ┌─────────────────────────────────────┐
│   Stage 2 Writer    │────▶│   Per-Role Storage                  │
│     (GPT-4o)        │     │   output/roles/{role_id}/           │
│                     │     │                                     │
│ • Cover letter      │     │ • job_posting.json                  │
│   (score >= 8)      │     │ • evaluation.json                   │
│ • Recruiter msg     │     │ • application_plan.json             │
│   (score >= 9 +HIGH)│     │ • pipeline_state.json               │
└─────────────────────┘     └─────────────────────────────────────┘

Stage 1: Scrape & Evaluate

Playwright navigates LinkedIn searches, extracts job descriptions, deduplicates by canonical ID (SHA256 hash of linkedin:job:id). Each job passes through GPT-4o with structured outputs for guaranteed schema compliance, then through deterministic post-evaluation gates. Output: APPLY, CONSIDER, or SKIP.

Stage 2: Generate Writing

Only APPLY roles (score ≥ 9) trigger Stage 2. Cover letters require score ≥ 8; recruiter messages require score ≥ 9 AND HIGH confidence. Uses a separate writing profile optimized for narrative, not screening criteria. Output is submission-ready.

Where LLMs Apply (and Where They Don't)

The architecture splits decisions by reliability requirements:

Decision Type	Handler	Why
Role fit score (1-10)	LLM	Requires reading comprehension, context about my background
Role classification	LLM	Titles are misleading; "Software Engineer" can be pure platform work
Risk assessment	LLM	Startup funding signals, team maturity — requires inference
Contract terms (1099, C2C)	Regex	Binary constraint. Pattern: `contract-to-hire\|1099\|corp-to-corp`
Location requirements	Regex	Binary constraint. Pattern: `relocation required\|onsite required`
Staffing firm detection	Regex	Company name matching against 15+ known staffing indicators
Cover letter generation	LLM	Requires narrative construction, role-specific positioning

Post-Evaluation Gates

After GPT-4o returns its evaluation, five deterministic gates can override the decision:

# Gate 1: Contract terms force SKIP even from APPLY
CONTRACT_SKIP_PATTERNS = [
    r"\bcontract-to-hire\b", r"\b1099\b", r"\bcorp-to-corp\b",
    r"\bw2 contract\b", r"\bstaff augmentation\b"
]

# Gate 2: Onsite requirements force SKIP
ONSITE_SKIP_PATTERNS = [
    r"\brelocation required\b", r"\bonsite required\b"
]

# Gate 3: Staffing firm detection → downgrades APPLY to CONSIDER
STAFFING_INDICATORS = ["staffing", "recruiting", "tek systems", ...]

# Gate 4: Role mismatch → SKIP if negative patterns without positives
ROLE_NEGATIVE_PATTERNS = [r"\bfrontend\b", r"\bfullstack\b", ...]

# Gate 5: APPLY cap → max 20% of roles per scan can be APPLY
# Sorted by confidence (HIGH first), then score (descending)

If GPT-4o says APPLY but the job description contains "contract-to-hire", the gate overrides to SKIP. No exceptions. This catches the cases where the LLM missed or misinterpreted terms.

Auditability

Every evaluation is traceable. The metadata captures what was sent, what was returned, and how the decision was made:

{
  "model": "gpt-4o",
  "temperature": 0.3,
  "prompt_version": "2.1",
  "prompt_hash": "a1b2c3d4e5f6",
  "profile_version": "2024-01-29",
  "profile_hash": "f6e5d4c3b2a1",
  "job_description_hash": "1a2b3c4d5e6f",
  "latency_ms": 1847,
  "token_usage": {
    "prompt_tokens": 2156,
    "completion_tokens": 412,
    "total_tokens": 2568
  },
  "decision_path": [
    "api_success", "json_parsed", "schema_valid",
    "score_9", "decision_apply", "gate_downgrade"
  ],
  "pre_gates_final_decision": "APPLY",
  "post_gates_final_decision": "CONSIDER",
  "post_gates_reasons": ["staffing:company:tek systems"]
}

This enables:

Prompt regression testing: Run test cases against new prompt versions, compare decisions
Decision forensics: Understand why a role was skipped months later
Drift detection: If APPLY rates spike or crash, trace to prompt or profile changes
Cost analysis: Track token usage per evaluation for budget planning

Schema Enforcement

The pipeline uses OpenAI's Structured Outputs with strict JSON schema enforcement:

EVALUATION_SCHEMA = {
  "type": "json_schema",
  "json_schema": {
    "name": "job_evaluation",
    "strict": True,
    "schema": {
      "properties": {
        "role_fit_score": { "type": "integer" },
        "final_decision": { "enum": ["APPLY", "CONSIDER", "SKIP"] },
        "seniority_level": { "enum": ["Junior", "Mid", "Senior", ...] },
        "confidence_signal": { "enum": ["HIGH", "MEDIUM", "LOW"] },
        "key_requirements": { "type": "array" },
        "concerns": { "type": "array" },
        "summary": { "type": "string" }
      },
      "additionalProperties": false
    }
  }
}

Defense in depth: even with structured outputs, the code validates schema as a fallback. Invalid responses fail closed to SKIP:

def _create_invalid_evaluation(self, error: str) -> JobEvaluation:
    return JobEvaluation(
        role_fit_score=1,
        final_decision="SKIP",
        confidence_signal="LOW",
        risk_level="high",
        is_valid=False,
        error=error,
    )

Storage Model

Each role gets its own directory with typed JSON files. The canonical ID is derived from the source and job ID, then hashed to create the role_id:

output/
├── roles/                           # Primary storage
│   └── a1b2c3d4e5f6/               # role_id = SHA256(canonical_id)[:12]
│       ├── job_posting.json        # Scraped data + diagnostics
│       ├── evaluation.json         # Stage 1 result + metadata
│       ├── application_plan.json   # Stage 2 outputs (if run)
│       └── pipeline_state.json     # Workflow state
├── apply/                          # Stage 2 outputs by timestamp
├── scan-results.json               # Derived view (regenerated from roles)
├── quarantine/                     # Invalid data for debugging
└── needs_attention/                # Recoverable issues

Key insight:scan-results.json is derived, not primary. The source of truth is the per-role directories. This enables idempotent scraping — re-running Stage 1 on the same jobs is a no-op.

Outcomes

80% of screening automated: Daily pipeline processes 50+ roles with one command
Zero false positives on dealbreakers: Deterministic gates catch contract and location terms the LLM missed
Submission-ready cover letters: High-match roles get tailored letters without manual drafting
Full audit trail: Every decision traceable to prompt version, profile version, and input hash
APPLY rate control: 20% cap prevents over-confidence from flooding the queue

Decision	Criteria	Action
APPLY	Score ≥ 9, no dealbreakers, within 20% cap	Generate cover letter + recruiter message
CONSIDER	Score 6-8, or downgraded by gates	Manual review queue
SKIP	Score < 6, or dealbreaker detected	No action

What I Learned

LLMs are great at judgment, terrible at constraints.

Use them for scoring and classification. Use regex for binary decisions. The hybrid approach catches what each misses alone.

Structured outputs reduce but don't eliminate schema errors.

Defense in depth matters. Validate even when the API guarantees structure. Fail closed to SKIP, not APPLY.

Prompt versioning is essential for iteration.

Without version tracking, you can't run regression tests. Without regression tests, prompt changes are blind experiments.

Canonical IDs enable idempotent operations.

Hashing the job source and ID to create role_id means re-scraping the same job is a no-op. Deduplication is automatic.

Never auto-apply.

The system advises. I decide. This isn't automation for automation's sake — it's leverage for better decisions.

Job Scanner is a pipeline I built to automate job screening with control, auditability, and reliability as core constraints. LLMs handle the judgment calls; deterministic gates enforce the dealbreakers. Every decision is traceable. The output is actionable.

This is how I approach LLM-powered automation: trust where appropriate, verify everywhere.

← Back to Projects View on GitHub