Separate path and query in URI API #2290

yadij · 2025-10-29T23:50:11Z

No description provided.

yadij · 2025-10-29T23:53:32Z

This is the followup refactoring step for PR #2141 .
Should also fix bug 5523 properly: URI components parsed separately then re-formed with appropriate encoding for path and query separately to make the absolute-path component with un-encoded '?' separator.

rousskov

This is a partial review. There are more problems to fix here, but enumerating all of them is too early for this premature PR as this code is likely to change a lot, including in prerequisite PRs.

rousskov · 2025-10-30T13:24:56Z

src/anyp/Uri.cc

+                return false;
+
+            if (*src != '\r' && *src != '\n' && *src != '\0')
+                throw TextException("invalid URL", Here());


A parsing method that reports success/failure via its boolean return value should not throw on invalid input. The two result-reporting designs are mutually exclusive.

um, I don't think that is correct use of the term "mutually exclusive".

This method is in transition towards the state where parsing errors are thrown, and the boolean false means needs-more-data. The bits of logic where you do not see that existing are yet to be refactored away.

rousskov · 2025-10-30T13:35:13Z

src/anyp/Uri.cc

+                }
+                *dst = '\0';
+                chopped = StripAnyWsp(urlpath);
+                rfc1738_unescape(urlpath);


We should not add serious features to this bad (on several levels) Uri::parse() code until we refactor it to address its known quality problems. Two recent additions resulted in two serious bugs, and some of those recent changes still need to be polished to reach acceptable quality levels. We should not pile up more serious changes on top of this terrible foundation.

You seems to have missed the fact that this PR is part of such a refactor.

Specifically that prior commit ff60b10 left the '?' query delimiter in the path() output (where it got wrongly encoded). After this PR we can review all the previously affected callers knowing that path and query are actually separate or re-combined properly depending on which method is used to access them. I expect to find that they are all correct after this change.

rousskov · 2025-10-30T13:39:33Z

src/anyp/Uri.cc

+                ++i; // keep track of bytes 'consumed' from src
+                /* Then everything from '#' (exclusive) until '\r\n' or '\0' - that's fragment */
+                for (; i < l && *src != '\r' && *src != '\n' && *src != '\0'; ++i, ++src) {
+                    ; // fragment component is not expected in network protocols. discard for now.


If official parse() code does not discard fragments, then this PR cannot use a "for now" excuse to start discarding them; most likely, this PR should not discard fragments at all, but we can discuss better excuses if really needed.

Official code calls it part of "path" component. Which is patently false.

This code splits the components correctly per RFC3986. Keeping the components which are used by ftp, and http(s) URLs.

If you insist on retaining fragment support, I can easily add it back in. Though it would be pointless to store, since it was only actually used by gopher: URLs.

yadij · 2025-12-27T04:54:57Z

src/anyp/Uri.cc

+        if (!path().isEmpty())
+            absolutePath_ = Encode(path(), PathChars());
+        if (!query().isEmpty()) {
+            absolutePath_.append("?");
+            absolutePath_.append(Encode(query(), QueryChars()));
+        }


FYI: This TODO implementation is the fix for bug 5523 titular issue for PR #2289. The rest of this PR is logic supporting this change.

Separate path and query in URI API

e7494c6

yadij added 2 commits October 30, 2025 13:14

Polish documentation

57e7682

Ensure path and quqery are unescaped by parse

7006221

rousskov self-requested a review October 30, 2025 04:33

rousskov requested changes Oct 30, 2025

View reviewed changes

yadij requested a review from rousskov November 5, 2025 14:56

yadij added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Nov 5, 2025

yadij mentioned this pull request Dec 27, 2025

Bug 5523: Do not percent-encode URI suffixes #2289

Open

yadij commented Dec 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate path and query in URI API #2290

Separate path and query in URI API #2290

yadij commented Oct 29, 2025

Uh oh!

yadij commented Oct 29, 2025 •

edited

Loading

Uh oh!

rousskov left a comment

Uh oh!

rousskov Oct 30, 2025

Uh oh!

yadij Oct 31, 2025

Uh oh!

rousskov Oct 30, 2025

Uh oh!

yadij Oct 31, 2025 •

edited

Loading

Uh oh!

rousskov Oct 30, 2025

Uh oh!

yadij Oct 31, 2025

Uh oh!

yadij Dec 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Separate path and query in URI API #2290

Are you sure you want to change the base?

Separate path and query in URI API #2290

Conversation

yadij commented Oct 29, 2025

Uh oh!

yadij commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rousskov left a comment

Choose a reason for hiding this comment

Uh oh!

rousskov Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yadij Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

rousskov Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yadij Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rousskov Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

yadij Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

yadij Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yadij commented Oct 29, 2025 •

edited

Loading

yadij Oct 31, 2025 •

edited

Loading

yadij Dec 27, 2025 •

edited

Loading