-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add external link checking with lychee #15893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vaind
wants to merge
34
commits into
master
Choose a base branch
from
add-external-link-checker
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+304
−0
Open
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
9592886
Add lychee configuration for external link checking
vaind f1b1df3
Add GitHub workflow for external link checking
vaind 082da39
Document external link checking in lint-404s README
vaind bb72320
Add pre-commit hook for external link checking
vaind 12c24da
Document local usage and pre-commit hook in README
vaind 061980b
Simplify pre-commit hook by inlining command
vaind 9a514bd
Add Lychee cache to .gitignore
vaind 197ca55
Use TypeScript for pre-commit hook (cross-platform)
vaind d8f00ec
Only check changed files in PR workflow
vaind 0b94753
[getsentry/action-github-commit] Auto commit
getsantry[bot] e71be85
Fix lychee config to reduce false positives
vaind 0cb8b70
Use base_url to resolve root-relative links
vaind 6b13baf
Add ignore patterns for TLS-incompatible and internal sites
vaind 86594d3
Use optional credentials pattern for private IPs
vaind 4a963ad
Refactor workflow: separate PR and full-scan jobs
vaind 7419a69
Refine external link checker workflow and config
vaind 43ebc4d
[getsentry/action-github-commit] Auto commit
getsantry[bot] 7534a9e
Improve lychee config: add exclude_all_private and remove directory f…
vaind 76cf9a1
Refactor PR link check workflow: remove comment step and unnecessary …
vaind f0ea8a6
Update README.md: clarify external link checking behavior in PRs and …
vaind de36c8e
Use cross-platform lychee detection in pre-commit hook
vaind 2c7df02
[getsentry/action-github-commit] Auto commit
getsantry[bot] eb10555
cleanup
vaind b88debc
Add GitHub Actions cache for lychee link checking
vaind 06ff59d
Exclude transient errors from lychee cache
vaind 2135419
Refactor external link checker workflow to enforce failure on broken …
vaind 3fb2cab
Enable failure on broken links in external link checker
vaind 1879d32
tmp
vaind e245022
save cache even on failure
vaind 876f2bf
config tuning
vaind 0df8be4
cleanup
vaind 5e23c12
disable temp full run
vaind 726582e
Include .mdx files in pre-commit link check
vaind 4e14ee7
fix: update cache_exclude_status format in lychee.toml
vaind File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| name: Check External Links | ||
|
|
||
| on: | ||
| # Run weekly on Sundays at 2 AM UTC | ||
| schedule: | ||
| - cron: '0 2 * * 0' | ||
|
|
||
| # Allow manual triggering | ||
| workflow_dispatch: | ||
|
|
||
| # Run on PRs that modify docs (non-blocking) | ||
| pull_request: | ||
| branches: [master] | ||
|
|
||
| jobs: | ||
| # Job for PRs: check only changed files | ||
| check-pr: | ||
| if: github.event_name == 'pull_request' | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Get changed files | ||
| id: changed | ||
| run: | | ||
| FILES=$(git diff --name-only --diff-filter=AM origin/${{ github.base_ref }}...HEAD -- '*.md' '*.mdx' || true) | ||
| if [ -z "$FILES" ]; then | ||
| echo "files=" >> $GITHUB_OUTPUT | ||
| echo "No markdown files changed" | ||
| else | ||
| echo "files<<EOF" >> $GITHUB_OUTPUT | ||
| echo "$FILES" >> $GITHUB_OUTPUT | ||
| echo "EOF" >> $GITHUB_OUTPUT | ||
| echo "Changed files:" | ||
| echo "$FILES" | ||
| fi | ||
|
|
||
| - name: Restore lychee cache | ||
| if: steps.changed.outputs.files != '' | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .lycheecache | ||
| key: lychee-cache- | ||
| restore-keys: lychee-cache- | ||
|
|
||
| - name: Check external links | ||
| if: steps.changed.outputs.files != '' | ||
| uses: lycheeverse/lychee-action@v2 | ||
| with: | ||
| args: --verbose --no-progress ${{ steps.changed.outputs.files }} | ||
| fail: true | ||
| jobSummary: true | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| # Job for scheduled/manual runs: check all files, create issue | ||
| check-full: | ||
| if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| issues: write | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| # Cache strategy: see lychee.toml for details | ||
| # - Restore previous cache so successful checks are skipped | ||
| # - Transient errors (429, 5xx) are excluded from cache and retried | ||
| # - Save updated cache for next run | ||
| - name: Restore lychee cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .lycheecache | ||
| key: lychee-cache- | ||
| restore-keys: lychee-cache- | ||
|
|
||
| - name: Check external links | ||
| id: lychee | ||
| uses: lycheeverse/lychee-action@v2 | ||
| with: | ||
| args: --verbose . | ||
| output: ./lychee-report.md | ||
| format: markdown | ||
| fail: true | ||
| jobSummary: true | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Save lychee cache | ||
| uses: actions/cache/save@v4 | ||
| if: always() | ||
| with: | ||
| path: .lycheecache | ||
| key: lychee-cache-${{ github.run_id }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -105,3 +105,6 @@ public/og-images/* | |
| yalc.lock | ||
| /public/doctree.json | ||
| /public/doctree-dev.json | ||
|
|
||
| # Lychee cache | ||
| .lycheecache | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # URLs to ignore during external link checking | ||
| # Supports regex patterns - lines starting with # are comments | ||
| # Note: Private IPs (localhost, 10.x, 172.16-31.x, 192.168.x) are handled by exclude_all_private in lychee.toml | ||
|
|
||
| # Example/placeholder URLs | ||
| https?://example\.com.* | ||
| https?://your-.* | ||
| https?://.*\.example\..* | ||
| https?://___.*___.* | ||
|
|
||
| # Internal Sentry development URLs | ||
| https?://.*\.getsentry\.net.* | ||
| https?://sentry-content-dashboard\.sentry\.dev.* | ||
|
|
||
| # Sites known to block automated checkers | ||
| https?://twitter\.com.* | ||
| https?://x\.com.* | ||
| https?://linkedin\.com.* | ||
| https?://www\.linkedin\.com.* | ||
| https?://www\.npmjs\.com.* | ||
| https?://search\.maven\.org.* | ||
| https?://medium\.com.* | ||
| https?://.*\.medium\.com.* | ||
| https?://gitlab\.com/oauth/.* | ||
| https?://docs\.gitlab\.com.* | ||
| https?://dev\.epicgames\.com.* | ||
| https?://docs\.unrealengine\.com.* | ||
| https?://cursor\.com.* | ||
| https?://dash\.cloudflare\.com.* | ||
| https?://www\.freedesktop\.org.* | ||
|
|
||
| # TLS compatibility issues (sites work in browser but fail in lychee due to native-tls) | ||
| # bottlepy.org only supports TLS 1.3, incompatible with lychee's TLS backend | ||
| https?://bottlepy\.org.* | ||
|
|
||
| # Cloudflare ECH (Encrypted Client Hello) required - fails even with curl/openssl | ||
| https?://help\.revise\.dev.* | ||
| https?://.*\.intercomhelpcenter\.com.* | ||
|
|
||
| # Rate-limited sites (may fail intermittently with 429) | ||
| https?://godoc\.org.* | ||
| https?://pkg\.go\.dev.* | ||
|
|
||
| # Interactive demos that may not respond to HEAD requests | ||
| https?://demo\.arcade\.software.* | ||
|
|
||
| # Private/internal resources | ||
| https?://.*\.notion\.so.* | ||
| https?://www\.notion\.so.* | ||
| https?://github\.com/getsentry/getsentry.* | ||
| https?://github\.com/getsentry/sentry-options-automator.* | ||
| https?://github\.com/getsentry/etl.* | ||
| https?://sentry\.zendesk\.com.* | ||
|
|
||
| # Placeholder domains commonly used in docs | ||
| https?://api\.example\.com.* | ||
| https?://your-api-host.* | ||
| https?://empowerplant\.io.* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # Lychee configuration for external link checking | ||
| # Documentation: https://github.com/lycheeverse/lychee | ||
|
|
||
| # Base URL to resolve root-relative links | ||
| base_url = "https://docs.sentry.io" | ||
|
|
||
| # Only check HTTP and HTTPS links | ||
| scheme = ["https", "http"] | ||
|
|
||
| # Exclude all private IP addresses automatically (localhost, 10.x, 172.16-31.x, 192.168.x, etc.) | ||
| exclude_all_private = true | ||
|
|
||
| # Exclude internal links (already handled by lint-404s script) | ||
| exclude = ['^https://docs\.sentry\.io'] | ||
|
|
||
| # Maximum number of concurrent requests | ||
| max_concurrency = 32 | ||
|
|
||
| # Maximum number of retries per request | ||
| max_retries = 2 | ||
|
|
||
| # Request timeout in seconds | ||
| timeout = 30 | ||
|
|
||
| # Retry wait time in seconds | ||
| retry_wait_time = 2 | ||
|
|
||
| # User agent (some sites block default user agents) | ||
| user_agent = "Mozilla/5.0 (compatible; Sentry-Docs-Link-Checker; +https://github.com/getsentry/sentry-docs)" | ||
|
|
||
| # Accept common status codes that indicate the link works | ||
| # Include 403 (possibly bot blocking) and 418 (freedesktop teapot) to reduce noise | ||
| accept = [200, 201, 202, 203, 204, 206, 301, 302, 308, 403, 418] | ||
|
|
||
| # Don't validate URL fragments/anchors (e.g., #section-name) | ||
| # Fragment checking is unreliable: JS-rendered anchors appear broken, and many sites don't validate them | ||
| include_fragments = false | ||
|
|
||
| # Only check external links (our internal check handles internal ones) | ||
| include_mail = false | ||
| include_verbatim = false | ||
|
|
||
| # Follow redirects | ||
| max_redirects = 10 | ||
|
|
||
| # Cache settings | ||
| # | ||
| # Strategy: Weekly scheduled runs populate the cache, PR checks consume it. | ||
| # - Successful responses (200, 301, 403, 404) are cached and skipped on subsequent runs | ||
| # - Transient errors (429 rate limits, 5xx server errors) are NOT cached, so they get retried | ||
| # - Cache lifetime is just under 2 weeks so it survives between weekly runs | ||
| # | ||
| # This means each weekly run only re-checks: | ||
| # 1. Links that failed with transient errors last time | ||
| # 2. New links not yet in cache | ||
| cache = true | ||
| max_cache_age = "335h" | ||
| cache_exclude_status = "429, 500.." | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| /** | ||
| * Pre-commit hook wrapper for lychee external link checker. | ||
| * Runs lychee on provided files and warns on broken links without blocking commits. | ||
| * | ||
| * Usage: bun scripts/lint-external-links.ts [files...] | ||
| */ | ||
|
|
||
| import {spawnSync} from 'child_process'; | ||
|
|
||
| // Check if lychee is installed | ||
| const versionCheck = spawnSync('lychee', ['--version'], { | ||
| encoding: 'utf-8', | ||
| stdio: 'pipe', | ||
| }); | ||
| if (versionCheck.error || versionCheck.status !== 0) { | ||
| console.log('Warning: lychee not installed. Skipping external link check.'); | ||
| console.log( | ||
| 'Install with: brew install lychee (macOS) or cargo install lychee (cross-platform)' | ||
| ); | ||
| process.exit(0); | ||
| } | ||
|
|
||
| const files = process.argv.slice(2); | ||
| if (files.length === 0) { | ||
| process.exit(0); | ||
| } | ||
|
|
||
| // Run lychee on the provided files | ||
| const result = spawnSync('lychee', ['--no-progress', ...files], { | ||
| stdio: 'inherit', | ||
| encoding: 'utf-8', | ||
| }); | ||
|
|
||
| if (result.status !== 0) { | ||
| console.log(''); | ||
| console.log('⚠️ External link issues found (commit not blocked)'); | ||
| } | ||
|
|
||
| // Always exit 0 so commit proceeds | ||
| process.exit(0); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.