From 5b34b2c812fabf702da62b419881abe25e624f93 Mon Sep 17 00:00:00 2001 From: masklinn Date: Mon, 9 Jun 2025 16:41:16 +0200 Subject: [PATCH 1/2] Update wording of resolvers guide Given ua-parser/uap-rust#29 and ua-parser/uap-rust#31, the wording of the comparison needs to be updated to account for: - The `regex` memory use being much improved. - The `regex` runtime on devices being slightly improved, with the Python interface to `re2` not supporting custom atom lengths. Closes #264 --- doc/guides.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/doc/guides.rst b/doc/guides.rst index 39b43e4..c16e601 100644 --- a/doc/guides.rst +++ b/doc/guides.rst @@ -153,7 +153,7 @@ Builtin Resolvers * - ``regex`` - great - good - - bad + - fine - great * - ``re2`` - good @@ -182,12 +182,11 @@ it: interpreters and platforms supported by pyo3 (currently: cpython, pypy, and graalpy, on linux, macos and linux, intel and arm). It is also built as a cpython abi3 wheel and should thus suffer from no - compatibility issues with new release. + compatibility issues with new releases of cpython at least. - Built entirely out of safe rust code, its safety risks are entirely in ``regex`` and ``pyo3``. -- Its biggest drawback is that it is a lot more memory intensive than - the other resolvers, because ``regex`` tends to trade memory for - speed (~155MB high water mark on a real-world dataset). +- Uses somewhat more memory than the other resolvers (~85MB high water + mark on a real-world dataset). If available, it is the default resolver, without a cache. @@ -198,7 +197,7 @@ The ``re2`` resolver is built atop the widely used `google-re2 `_ via its built-in Python bindings. It: -- Is extremely fast, though around 80% slower than ``regex`` on +- Is quite fast, though only about half the speed of ``regex`` on real-world data. - Is only compatible with CPython, and uses pure API wheels, so needs a different release for each cpython version, for each OS, for each @@ -210,6 +209,9 @@ It: If available, it is the second-preferred resolver, without a cache. +At the end of the day, it is really only useful if the codebase +already uses ``re2``. + ``basic`` --------- From eece873cebd72ff30c744be71e64058ebecba8f7 Mon Sep 17 00:00:00 2001 From: masklinn Date: Mon, 9 Jun 2025 17:30:53 +0200 Subject: [PATCH 2/2] Remove unnecessary mentions of re2 Don't remove the feature, don't remove the resolver, and keep the resolver itself documented, but significantly de-emphasize `re2` by removing it from the README and from examples: users should not be encouraged to use it when they could use `regex`. --- README.rst | 10 +++------- doc/guides.rst | 14 +++++++------- 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/README.rst b/README.rst index 091fdda..17b405c 100644 --- a/README.rst +++ b/README.rst @@ -31,13 +31,9 @@ ua-parser supports CPython 3.9 and newer, recent pypy (supporting .. note:: - The ``[regex]`` feature is *strongly* recommended: - - - ``[re2]`` is slightly slower and only works with cpython, though - it is still a great option then (and is more memory-efficient). - - Pure python (no feature) is *significantly* slower, especially on - non-cpython runtimes, but it is the most memory efficient even - with caches. + The ``[regex]`` feature is *strongly* recommended, the Pure python + (no feature) is *significantly* slower, especially on non-cpython + runtimes, though it is the most memory efficient. See `builtin resolvers`_ for more explanation of the tradeoffs between the different options. diff --git a/doc/guides.rst b/doc/guides.rst index c16e601..9ea323e 100644 --- a/doc/guides.rst +++ b/doc/guides.rst @@ -93,10 +93,10 @@ composing :class:`~ua_parser.Resolver` objects. The most basic such customisation is simply configuring caching away from the default setup. -As an example, in the default configuration if |re2|_ is available the -RE2-based resolver is not cached, a user might consider the memory -investment worth it and want to reconfigure the stack for a cached -base. +As an example, in the default configuration if |regex|_ is available +the regex-based resolver is not cached, a user might consider the +memory investment worth it and want to reconfigure the stack for a +cached base. The process is uncomplicated as the APIs are designed to compose together. @@ -105,8 +105,8 @@ The first step is to instantiate a base resolver, instantiated with the relevant :class:`Matchers` data:: import ua_parser.loaders - import ua_parser.re2 - base = ua_parser.re2.Resolver( + import ua_parser.regex + base = ua_parser.regex.Resolver( ua_parser.loaders.load_lazy_builtins()) The next step is to instantiate the cache [#cache]_ suitably @@ -295,7 +295,7 @@ could then use something like:: Parser(FallbackResolver([ foo_resolver, - re2.Resolver(load_lazy_builtins()), + regex.Resolver(load_lazy_builtins()), ])) to prioritise cheap resolving of our application while still resolving