Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9592886
Add lychee configuration for external link checking
vaind Dec 30, 2025
f1b1df3
Add GitHub workflow for external link checking
vaind Dec 30, 2025
082da39
Document external link checking in lint-404s README
vaind Dec 30, 2025
bb72320
Add pre-commit hook for external link checking
vaind Dec 30, 2025
12c24da
Document local usage and pre-commit hook in README
vaind Dec 30, 2025
061980b
Simplify pre-commit hook by inlining command
vaind Dec 30, 2025
9a514bd
Add Lychee cache to .gitignore
vaind Dec 30, 2025
197ca55
Use TypeScript for pre-commit hook (cross-platform)
vaind Dec 30, 2025
d8f00ec
Only check changed files in PR workflow
vaind Dec 30, 2025
0b94753
[getsentry/action-github-commit] Auto commit
getsantry[bot] Dec 30, 2025
e71be85
Fix lychee config to reduce false positives
vaind Dec 30, 2025
0cb8b70
Use base_url to resolve root-relative links
vaind Dec 30, 2025
6b13baf
Add ignore patterns for TLS-incompatible and internal sites
vaind Dec 30, 2025
86594d3
Use optional credentials pattern for private IPs
vaind Dec 30, 2025
4a963ad
Refactor workflow: separate PR and full-scan jobs
vaind Dec 30, 2025
7419a69
Refine external link checker workflow and config
vaind Dec 30, 2025
43ebc4d
[getsentry/action-github-commit] Auto commit
getsantry[bot] Dec 30, 2025
7534a9e
Improve lychee config: add exclude_all_private and remove directory f…
vaind Dec 30, 2025
76cf9a1
Refactor PR link check workflow: remove comment step and unnecessary …
vaind Dec 30, 2025
f0ea8a6
Update README.md: clarify external link checking behavior in PRs and …
vaind Dec 30, 2025
de36c8e
Use cross-platform lychee detection in pre-commit hook
vaind Dec 30, 2025
2c7df02
[getsentry/action-github-commit] Auto commit
getsantry[bot] Dec 30, 2025
eb10555
cleanup
vaind Dec 31, 2025
b88debc
Add GitHub Actions cache for lychee link checking
vaind Dec 31, 2025
06ff59d
Exclude transient errors from lychee cache
vaind Dec 31, 2025
2135419
Refactor external link checker workflow to enforce failure on broken …
vaind Jan 2, 2026
3fb2cab
Enable failure on broken links in external link checker
vaind Jan 2, 2026
1879d32
tmp
vaind Jan 2, 2026
e245022
save cache even on failure
vaind Jan 2, 2026
876f2bf
config tuning
vaind Jan 2, 2026
0df8be4
cleanup
vaind Jan 2, 2026
5e23c12
disable temp full run
vaind Jan 2, 2026
726582e
Include .mdx files in pre-commit link check
vaind Jan 2, 2026
4e14ee7
fix: update cache_exclude_status format in lychee.toml
vaind Jan 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions .github/workflows/lint-external-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: Check External Links

on:
# Run weekly on Sundays at 2 AM UTC
schedule:
- cron: '0 2 * * 0'

# Allow manual triggering
workflow_dispatch:

# Run on PRs that modify docs (non-blocking)
pull_request:
branches: [master]

jobs:
# Job for PRs: check only changed files
check-pr:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Get changed files
id: changed
run: |
FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.base_ref }}...HEAD -- '*.md' '*.mdx' || true)
if [ -z "$FILES" ]; then
echo "files=" >> $GITHUB_OUTPUT
echo "No markdown files changed"
else
echo "files<<EOF" >> $GITHUB_OUTPUT
echo "$FILES" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
echo "Changed files:"
echo "$FILES"
fi

- name: Restore lychee cache
if: steps.changed.outputs.files != ''
uses: actions/cache/restore@v4
with:
path: .lycheecache
key: lychee-cache-
restore-keys: lychee-cache-

- name: Check external links
if: steps.changed.outputs.files != ''
uses: lycheeverse/lychee-action@v2
with:
args: --verbose --no-progress ${{ steps.changed.outputs.files }}
fail: true
jobSummary: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

# Job for scheduled/manual runs: check all files, create issue
check-full:
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
permissions:
issues: write

steps:
- uses: actions/checkout@v4

# Cache strategy: see lychee.toml for details
# - Restore previous cache so successful checks are skipped
# - Transient errors (429, 5xx) are excluded from cache and retried
# - Save updated cache for next run
- name: Restore lychee cache
uses: actions/cache/restore@v4
with:
path: .lycheecache
key: lychee-cache-
restore-keys: lychee-cache-

- name: Check external links
id: lychee
uses: lycheeverse/lychee-action@v2
with:
args: --verbose .
output: ./lychee-report.md
format: markdown
fail: true
jobSummary: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Save lychee cache
uses: actions/cache/save@v4
if: always()
with:
path: .lycheecache
key: lychee-cache-${{ github.run_id }}
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,6 @@ public/og-images/*
yalc.lock
/public/doctree.json
/public/doctree-dev.json

# Lychee cache
.lycheecache
58 changes: 58 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# URLs to ignore during external link checking
# Supports regex patterns - lines starting with # are comments
# Note: Private IPs (localhost, 10.x, 172.16-31.x, 192.168.x) are handled by exclude_all_private in lychee.toml

# Example/placeholder URLs
https?://example\.com.*
https?://your-.*
https?://.*\.example\..*
https?://___.*___.*

# Internal Sentry development URLs
https?://.*\.getsentry\.net.*
https?://sentry-content-dashboard\.sentry\.dev.*

# Sites known to block automated checkers
https?://twitter\.com.*
https?://x\.com.*
https?://linkedin\.com.*
https?://www\.linkedin\.com.*
https?://www\.npmjs\.com.*
https?://search\.maven\.org.*
https?://medium\.com.*
https?://.*\.medium\.com.*
https?://gitlab\.com/oauth/.*
https?://docs\.gitlab\.com.*
https?://dev\.epicgames\.com.*
https?://docs\.unrealengine\.com.*
https?://cursor\.com.*
https?://dash\.cloudflare\.com.*
https?://www\.freedesktop\.org.*

# TLS compatibility issues (sites work in browser but fail in lychee due to native-tls)
# bottlepy.org only supports TLS 1.3, incompatible with lychee's TLS backend
https?://bottlepy\.org.*

# Cloudflare ECH (Encrypted Client Hello) required - fails even with curl/openssl
https?://help\.revise\.dev.*
https?://.*\.intercomhelpcenter\.com.*

# Rate-limited sites (may fail intermittently with 429)
https?://godoc\.org.*
https?://pkg\.go\.dev.*

# Interactive demos that may not respond to HEAD requests
https?://demo\.arcade\.software.*

# Private/internal resources
https?://.*\.notion\.so.*
https?://www\.notion\.so.*
https?://github\.com/getsentry/getsentry.*
https?://github\.com/getsentry/sentry-options-automator.*
https?://github\.com/getsentry/etl.*
https?://sentry\.zendesk\.com.*

# Placeholder domains commonly used in docs
https?://api\.example\.com.*
https?://your-api-host.*
https?://empowerplant\.io.*
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,11 @@ repos:
rev: v1.39.0
hooks:
- id: typos
- repo: local
hooks:
- id: lychee
name: Check external links (warn only)
entry: bun scripts/lint-external-links.ts
language: system
files: \.(md|mdx)$
verbose: true
58 changes: 58 additions & 0 deletions lychee.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Lychee configuration for external link checking
# Documentation: https://github.com/lycheeverse/lychee

# Base URL to resolve root-relative links
base_url = "https://docs.sentry.io"

# Only check HTTP and HTTPS links
scheme = ["https", "http"]

# Exclude all private IP addresses automatically (localhost, 10.x, 172.16-31.x, 192.168.x, etc.)
exclude_all_private = true

# Exclude internal links (already handled by lint-404s script)
exclude = ['^https://docs\.sentry\.io']

# Maximum number of concurrent requests
max_concurrency = 32

# Maximum number of retries per request
max_retries = 2

# Request timeout in seconds
timeout = 30

# Retry wait time in seconds
retry_wait_time = 2

# User agent (some sites block default user agents)
user_agent = "Mozilla/5.0 (compatible; Sentry-Docs-Link-Checker; +https://github.com/getsentry/sentry-docs)"

# Accept common status codes that indicate the link works
# Include 403 (possibly bot blocking) and 418 (freedesktop teapot) to reduce noise
accept = [200, 201, 202, 203, 204, 206, 301, 302, 308, 403, 418]

# Don't validate URL fragments/anchors (e.g., #section-name)
# Fragment checking is unreliable: JS-rendered anchors appear broken, and many sites don't validate them
include_fragments = false

# Only check external links (our internal check handles internal ones)
include_mail = false
include_verbatim = false

# Follow redirects
max_redirects = 10

# Cache settings
#
# Strategy: Weekly scheduled runs populate the cache, PR checks consume it.
# - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs
# - Transient errors (429 rate limits, 5xx server errors) are NOT cached, so they get retried
# - Cache lifetime is just under 2 weeks so it survives between weekly runs
#
# This means each weekly run only re-checks:
# 1. Links that failed with transient errors last time
# 2. New links not yet in cache
cache = true
max_cache_age = "335h"
cache_exclude_status = "429, 500.."
40 changes: 40 additions & 0 deletions scripts/lint-404s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,43 @@ The `ignore-list.txt` file contains paths that should be skipped during checking

- `0` - No 404s found
- `1` - 404s were detected

## External Link Checking

This script only checks **internal links**. External links (to third-party sites) are validated separately using [lychee](https://github.com/lycheeverse/lychee).

### Running Locally

```bash
# Install lychee
brew install lychee

# Check all markdown files in the repo
lychee .

# Check a specific file
lychee docs/platforms/javascript/index.mdx
```

### Pre-commit Hook

A pre-commit hook checks external links in changed files (warn-only, won't block commits). Requires lychee to be installed locally.

### CI Workflow

The GitHub workflow (`.github/workflows/lint-external-links.yml`) runs:

- Weekly on a schedule (creates/updates issue with broken links)
- On PRs (checks changed files only)
- Manually via workflow dispatch

### Configuration Files

- `lychee.toml` - Lychee configuration
- `.lycheeignore` - URLs to ignore during checking

### Why Separate from Internal Link Checking?

1. **Performance**: External link checking is slower and shouldn't block PRs
2. **False positives**: Many external sites block automated checkers
3. **Different scope**: External checks only run on changed files in PRs; internal checks validate all pages
40 changes: 40 additions & 0 deletions scripts/lint-external-links.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/**
* Pre-commit hook wrapper for lychee external link checker.
* Runs lychee on provided files and warns on broken links without blocking commits.
*
* Usage: bun scripts/lint-external-links.ts [files...]
*/

import {spawnSync} from 'child_process';

// Check if lychee is installed
const versionCheck = spawnSync('lychee', ['--version'], {
encoding: 'utf-8',
stdio: 'pipe',
});
if (versionCheck.error || versionCheck.status !== 0) {
console.log('Warning: lychee not installed. Skipping external link check.');
console.log(
'Install with: brew install lychee (macOS) or cargo install lychee (cross-platform)'
);
process.exit(0);
}

const files = process.argv.slice(2);
if (files.length === 0) {
process.exit(0);
}

// Run lychee on the provided files
const result = spawnSync('lychee', ['--no-progress', ...files], {
stdio: 'inherit',
encoding: 'utf-8',
});

if (result.status !== 0) {
console.log('');
console.log('⚠️ External link issues found (commit not blocked)');
}

// Always exit 0 so commit proceeds
process.exit(0);
Loading