-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48167: [Python][C++][Compute] Add python bindings for scatter, inverse_permutation #48267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-48167: [Python][C++][Compute] Add python bindings for scatter, inverse_permutation #48267
Conversation
|
|
|
The background of Does it make sense to add Python wrappers/bindings for Currently both |
|
|
1 similar comment
|
|
|
Hi @tadeja , thanks for working on this. Appreciate it. I'm glad to see we are adding python bindings for
I think it does make sense to have them.
I'm not sure what you mean. The underlying C++ APIs will accept those options. So I would expect no problem for their python bindings. Are you saying that they are not working? |
a3b6fb1 to
47e3e8c
Compare
|
Thank you for the input, @zanmato1984, that helped.
It looks like Python bindings got fixed by adding option class names to FunctionDoc in vector_swizzle.cc. Now If See example below: import pyarrow.compute as pc
pc.InversePermutationOptions()
>>> InversePermutationOptions(max_index=-1, output_type=<NULLPTR>)import pyarrow.compute as pc
pc.InversePermutationOptions().serialize()
>>> Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
pc.InversePermutationOptions().serialize()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "pyarrow/_compute.pyx", line 623, in pyarrow._compute.FunctionOptions.serialize
shared_ptr[CBuffer] c_buf = GetResultValue(res)
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
return check_status(status)
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
raise convert_status(status)
pyarrow.lib.ArrowInvalid: Could not serialize field output_type of options type InversePermutationOptions: shared_ptr<DataType> is nullptrcc @pitrou |
|
Thanks @tadeja for the further explanation! I now see the problem. Let me check the code and get back to you soon. |
Nice catch! I forgot to declare the option type in function doc when I was adding these functions, and didn't realize it affects python binding until you fixed it. Really neat finding, thank you!
I want to suggest the following changes to allow arrow/cpp/src/arrow/compute/function_internal.h Lines 347 to 353 in e90bacd
static inline Result<std::shared_ptr<Scalar>> GenericToScalar(
const std::shared_ptr<DataType>& value) {
if (!value) {
- return Status::Invalid("shared_ptr<DataType> is nullptr");
+ return std::make_shared<NullScalar>();
}
return MakeNullScalar(value);
}and arrow/cpp/src/arrow/compute/function_internal.h Lines 448 to 452 in e90bacd
template <typename T>
static inline enable_if_same_result<T, std::shared_ptr<DataType>> GenericFromScalar(
const std::shared_ptr<Scalar>& value) {
+ if (value->type->id() == Type::NA) {
+ return std::shared_ptr<NullType>();
+ }
return value->type;
}But I'll need to consult @pitrou if this is appropriate. |
|
@zanmato1984 I've included your suggested changes - 208fd24 |
|
Hi @pitrou , do you think the changes to data type pointer serde make sense? Thanks. |
That will have annoying side-effects, for example an explicit We should try to find another way of serializing a null pointer as a scalar, though I'm not sure how. |
|
How about this instead: diff --git a/cpp/src/arrow/compute/function_internal.h b/cpp/src/arrow/compute/function_internal.h
index 9d8928466b..96bdbacf04 100644
--- a/cpp/src/arrow/compute/function_internal.h
+++ b/cpp/src/arrow/compute/function_internal.h
@@ -347,7 +347,8 @@ static inline Result<std::shared_ptr<Scalar>> GenericToScalar(
static inline Result<std::shared_ptr<Scalar>> GenericToScalar(
const std::shared_ptr<DataType>& value) {
if (!value) {
- return Status::Invalid("shared_ptr<DataType> is nullptr");
+ // An omitted DataType is serialized as boolean true
+ return MakeScalar(boolean(), true);
}
return MakeNullScalar(value);
}
@@ -448,6 +449,10 @@ static inline enable_if_same_result<T, SortKey> GenericFromScalar(
template <typename T>
static inline enable_if_same_result<T, std::shared_ptr<DataType>> GenericFromScalar(
const std::shared_ptr<Scalar>& value) {
+ if (value->type->id() == Type::BOOL && value->is_valid) {
+ // Boolean true represents an omitted DataType (nullptr)
+ return nullptr;
+ }
return value->type;
}
Also, should add a test in |
Some alternative approaches in mind, both changing the C++ code:
@pitrou what do you think? |
Feels a bit hacky imho. |
Definitely (though less fragile than the solution in the current PR :-)). |
I would be ok with either, but that's an API breakage in both cases. I think @bkietz contributed the original serialization code, perhaps we would like to chime in? |
|
See my attempt here as per one of the mentioned alternative approaches:
Please check, @zanmato1984, @pitrou, what do you say? |
I'm OK with this approach. Since @bkietz is not joining us, shall we move on with this approach, @pitrou ? |
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
I've approved this PR. Let's wait for some while to hear from other reviewers. Then I'll merge it. Thanks a lot @tadeja for working on this! |
|
@github-actions crossbow submit -g cpp -g python |
|
Revision: 45ba764 Submitted crossbow builds: ursacomputing/crossbow @ actions-fa19036110 |
|
I've made a trivial commit to merely adjust the code order. Now CI is good. The failures are unrelated. I'm merging now. |
Rationale for this change
To close or discuss #48167
inverse_permutationandscatterfunctions got implemented via #44393, PR #44394.What changes are included in this PR?
Python tests for
scatter,inverse_permutationkernels and bindings forInversePermutationOptionsandScatterOptions.Are these changes tested?
Yes, tests added in test_compute.py.
Are there any user-facing changes?
Bindings for
InversePermutationOptionsandScatterOptionsare added.This PR includes breaking changes to public APIs.
Options
InversePermutationOptionschanged from accepting parameterstd::shared_ptr<DataType> output_type = NULLPTRto
std::optional<std::shared_ptr<DataType>> output_type = std::nullopt