Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
package org.togetherjava.tjbot.features.utils;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.concurrent.CompletableFuture;

import com.linkedin.urls.Url;
import com.linkedin.urls.detection.UrlDetector;
import com.linkedin.urls.detection.UrlDetectorOptions;
Expand Down Expand Up @@ -57,6 +63,54 @@ public static List<String> extractLinks(String content, Set<LinkFilter> filter)
public static boolean containsLink(String content) {
return !(new UrlDetector(content, UrlDetectorOptions.BRACKET_MATCH).detect().isEmpty());
}
public static CompletableFuture<Boolean> isLinkBroken(String url) {
HttpClient client = HttpClient.newHttpClient();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the HttpClient as a static reference in this class please?

i.e.

private static final HttpClient CLIENT = HttpClient.newHttpClient();

public static CompletableFuture<Boolean> isLinkBroken(String url) {
    HttpRequest request = HttpRequest.newBuilder(URI.create(url))
            .method("HEAD", HttpRequest.BodyPublishers.noBody())
            .build();

    return CLIENT.sendAsync(request, HttpResponse.BodyHandlers.discarding())
            .thenApply(response -> response.statusCode() >= 400)
            .exceptionally(ignored -> true);
}

Creating a new HttpClient per link is unnecessary and inefficient. HttpClient is thread-safe and designed to be reused across concurrent requests.


HttpRequest request = HttpRequest.newBuilder(URI.create(url))
.method("HEAD", HttpRequest.BodyPublishers.noBody())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While, it's a good idea to use HEAD requests, a lot of HTTP servers do not implement this properly, you might commonly get 405, 404 etc., and CDNs like Cloudflare actually block it as suspicious. It's best to use a hybrid approach here, try HEAD first and if that fails try GET before entirely failing.

.build();

return client.sendAsync(request, HttpResponse.BodyHandlers.discarding())
.thenApply(response -> response.statusCode() >= 400)
.exceptionally(ignored -> true);
}
public static CompletableFuture<String> replaceDeadLinks(
String text,
String replacement
) {
Set<LinkFilter> filters = Set.of(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to always re-create this Set as it never changes, we can make this static-final.

LinkFilter.SUPPRESSED,
LinkFilter.NON_HTTP_SCHEME
);

List<String> links = extractLinks(text, filters);

if (links.isEmpty()) {
return CompletableFuture.completedFuture(text);
}

StringBuilder result = new StringBuilder(text);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is not thread-safe. StringBuilder is mutated from multiple async callbacks, causing data races and invalid index calculations. Async HTTP completion order is nondeterministic, so index-based replacements are unsafe.

The correct approach is to separate concurrent link checks from sequential string mutation: perform all HTTP checks first, then apply replacements in a single thread once all futures complete.


List<CompletableFuture<Void>> checks = links.stream()
.map(link ->
isLinkBroken(link).thenAccept(isDead -> {
if (isDead) {
int index = result.indexOf(link);
if (index != -1) {
result.replace(
index,
index + link.length(),
replacement
);
}
}
})
)
.toList();

return CompletableFuture.allOf(checks.toArray(new CompletableFuture[0]))
.thenApply(v -> result.toString());
}

private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
String raw = url.getOriginalUrl();
Expand All @@ -76,7 +130,6 @@ private static Optional<String> toLink(Url url, Set<LinkFilter> filter) {
// Remove trailing punctuation
link = link.substring(0, link.length() - 1);
}

return Optional.of(link);
}

Expand Down
Loading