Skip to content

Conversation

@yadij
Copy link
Contributor

@yadij yadij commented Oct 29, 2025

No description provided.

@yadij
Copy link
Contributor Author

yadij commented Oct 29, 2025

This is the followup refactoring step for PR #2141 .
Should also fix bug 5523 properly: URI components parsed separately then re-formed with appropriate encoding for path and query separately to make the absolute-path component with un-encoded '?' separator.

@rousskov rousskov self-requested a review October 30, 2025 04:33
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review. There are more problems to fix here, but enumerating all of them is too early for this premature PR as this code is likely to change a lot, including in prerequisite PRs.

return false;

if (*src != '\r' && *src != '\n' && *src != '\0')
throw TextException("invalid URL", Here());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A parsing method that reports success/failure via its boolean return value should not throw on invalid input. The two result-reporting designs are mutually exclusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um, I don't think that is correct use of the term "mutually exclusive".

This method is in transition towards the state where parsing errors are thrown, and the boolean false means needs-more-data. The bits of logic where you do not see that existing are yet to be refactored away.

}
*dst = '\0';
chopped = StripAnyWsp(urlpath);
rfc1738_unescape(urlpath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not add serious features to this bad (on several levels) Uri::parse() code until we refactor it to address its known quality problems. Two recent additions resulted in two serious bugs, and some of those recent changes still need to be polished to reach acceptable quality levels. We should not pile up more serious changes on top of this terrible foundation.

Copy link
Contributor Author

@yadij yadij Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seems to have missed the fact that this PR is part of such a refactor.

Specifically that prior commit ff60b10 left the '?' query delimiter in the path() output (where it got wrongly encoded). After this PR we can review all the previously affected callers knowing that path and query are actually separate or re-combined properly depending on which method is used to access them. I expect to find that they are all correct after this change.

++i; // keep track of bytes 'consumed' from src
/* Then everything from '#' (exclusive) until '\r\n' or '\0' - that's fragment */
for (; i < l && *src != '\r' && *src != '\n' && *src != '\0'; ++i, ++src) {
; // fragment component is not expected in network protocols. discard for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If official parse() code does not discard fragments, then this PR cannot use a "for now" excuse to start discarding them; most likely, this PR should not discard fragments at all, but we can discuss better excuses if really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Official code calls it part of "path" component. Which is patently false.

This code splits the components correctly per RFC3986. Keeping the components which are used by ftp, and http(s) URLs.

If you insist on retaining fragment support, I can easily add it back in. Though it would be pointless to store, since it was only actually used by gopher: URLs.

@yadij yadij requested a review from rousskov November 5, 2025 14:56
@yadij yadij added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Nov 5, 2025
Comment on lines +838 to +843
if (!path().isEmpty())
absolutePath_ = Encode(path(), PathChars());
if (!query().isEmpty()) {
absolutePath_.append("?");
absolutePath_.append(Encode(query(), QueryChars()));
}
Copy link
Contributor Author

@yadij yadij Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This TODO implementation is the fix for bug 5523 titular issue for PR #2289. The rest of this PR is logic supporting this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants