Skip to content

Conversation

@mhabedan
Copy link
Contributor

Hi @GraemeWatt,

Is there a specific reason why Client.find() returns a string if the ids keyword is given? I see why that is helpful when calling find from the command line but if using the python interface, it seems unnecessary. In particular, if the list entries are first joined into a string by ' '.join(find_results) only to be broken up into a list again in user's scripts. (I have one where exactly this is the case.)

Insofar, if there are no principal objections, this PR resolves the issue by adding a format argument to Client.find(). To leave the default behaviour of find() unchanged, the default format remains str. It can also be set to list, set or tuple now, though, if desired.

Matching tests have been added, the readme updated and the version bumped.

@GraemeWatt GraemeWatt requested a review from Copilot October 21, 2025 12:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a format parameter to the Client.find() method, allowing users to specify the return type (str, list, set, or tuple) when retrieving IDs. Previously, IDs were always returned as a space-separated string, requiring users to manually split them into lists. The default behavior remains unchanged (returning a string) for backward compatibility.

Key Changes:

  • Added format parameter to find() method with support for str, list, set, and tuple return types
  • Updated _build_urls() to accept strings in addition to lists/tuples for id_list
  • Bumped version from 0.3.0 to 0.3.1

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hepdata_cli/api.py Added format parameter to find() method with logic to return results in different formats; updated _build_urls() to handle string input; fixed typo in assertion message
tests/test_search.py Updated test parameters and assertions to validate the new format parameter functionality
tests/test_download.py Added tests for different input formats (str, list, set, tuple) and new integration test combining find() with download()
README.md Updated documentation to explain the new format parameter and showed example usage with format=list
hepdata_cli/version.py Bumped version to 0.3.1

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

return find_results
else:
return ' '.join(find_results)
if format==str:
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing spaces around the equality operator. Should be if format == str: to follow PEP 8 style guidelines.

Suggested change
if format==str:
if format == str:

Copilot uses AI. Check for mistakes.
elif format in (set, tuple):
return format(find_results)
else:
raise TypeError(f"Cannot return results in specfied format: {format}. Allowed formats are: {str}, {list}, {set}, {tuple}.")
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'specfied' to 'specified'.

Suggested change
raise TypeError(f"Cannot return results in specfied format: {format}. Allowed formats are: {str}, {list}, {set}, {tuple}.")
raise TypeError(f"Cannot return results in specified format: {format}. Allowed formats are: {str}, {list}, {set}, {tuple}.")

Copilot uses AI. Check for mistakes.
Copy link
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I agree that it is useful to add the format option without changing the default behaviour, which comes from the arxiv-cli package that originally inspired hepdata-cli. However, the arxiv-cli package has not been maintained for 7 years and is now difficult to install. In a recent commit 3e8a9df I replaced its use in Example 6 with arxiv.py, but then I had to insert an additional line id_list = id_list.split() for it to work. The new format option makes it simpler. I'd be happy to merge after you address my two comments (and the two comments from Copilot).

elif format in (set, tuple):
return format(find_results)
else:
raise TypeError(f"Cannot return results in specfied format: {format}. Allowed formats are: {str}, {list}, {set}, {tuple}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not covered by the tests, resulting in a -0.4% decrease in test coverage for the PR. Please add a test that raises the TypeError exception.

from click.testing import CliRunner

from hepdata_cli.api import Client
from hepdata_cli.api import Client, MAX_MATCHES, MATCHES_PER_PAGE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why import MAX_MATCHES and MATCHES_PER_PAGE here? They're not used in this file.

@GraemeWatt
Copy link
Member

@codecov-ai-reviewer review

@codecov-ai

This comment has been minimized.

Comment on lines +88 to +89
else:
raise TypeError(f"Cannot return results in specfied format: {format}. Allowed formats are: {str}, {list}, {set}, {tuple}.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in error message: 'specfied' should be 'specified'. This is a user-facing error message that will appear unprofessional with the typo.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants