From e62d31cfe7a1821f6ea9450ede0c5f47f25f5a60 Mon Sep 17 00:00:00 2001 From: parthchonkar Date: Tue, 31 Dec 2024 01:39:00 -0600 Subject: [PATCH 1/4] [Docs] Update docs to point to separate java codebase --- README.md | 2 +- .../guide/step_by_step/arrow_codebase.rst | 13 ++++++++----- docs/source/developers/java/building.rst | 2 +- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index f49ec4b8d98..c557716a4a8 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Major components of the project include: - [Gandiva](https://github.com/apache/arrow/tree/main/cpp/src/gandiva): an [LLVM](https://llvm.org)-based Arrow expression compiler, part of the C++ codebase - [Go libraries](https://github.com/apache/arrow-go) - - [Java libraries](https://github.com/apache/arrow/tree/main/java) + - [Java libraries](https://github.com/apache/arrow-java) - [JavaScript libraries](https://github.com/apache/arrow/tree/main/js) - [Python libraries](https://github.com/apache/arrow/tree/main/python) - [R libraries](https://github.com/apache/arrow/tree/main/r) diff --git a/docs/source/developers/guide/step_by_step/arrow_codebase.rst b/docs/source/developers/guide/step_by_step/arrow_codebase.rst index c4ea61d89ff..8f84f2b381b 100644 --- a/docs/source/developers/guide/step_by_step/arrow_codebase.rst +++ b/docs/source/developers/guide/step_by_step/arrow_codebase.rst @@ -32,15 +32,18 @@ Working on the Arrow codebase 🧐 Finding your way around Arrow ============================= -The Apache Arrow repository includes implementations for +The `Apache Arrow repository `_ includes implementations for most of the libraries for which Arrow is available. Languages like GLib (``c_glib/``), C++ (``cpp/``), C# (``csharp/``), -Go (``go/``), Java (``java/``), JavaScript (``js/``), MATLAB -(``matlab/``), Python (``python/``), R (``r/``) and Ruby (``ruby/``) -have their own subdirectories in the main folder as written here. +JavaScript (``js/``), MATLAB (``matlab/``), Python (``python/``), R (``r/``) +and Ruby (``ruby/``) have their own subdirectories in the main folder as written here. -Rust has its own repository available `here `_. +The following language implementations have their own repositories: + +- `Rust `_ +- `Go `_ +- `Java `_ In the **language-specific subdirectories** you can find the code connected to that language. For example: diff --git a/docs/source/developers/java/building.rst b/docs/source/developers/java/building.rst index 372d44045f0..15420b78ff0 100644 --- a/docs/source/developers/java/building.rst +++ b/docs/source/developers/java/building.rst @@ -46,7 +46,7 @@ repository: .. code-block:: - $ git clone https://github.com/apache/arrow.git + $ git clone https://github.com/apache/arrow-java.git $ cd arrow $ git submodule update --init --recursive From 25520bd99ef7c0f22c19e9ec72670395e7a7b2a8 Mon Sep 17 00:00:00 2001 From: Parth Chonkar Date: Tue, 31 Dec 2024 11:35:37 -0600 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Sutou Kouhei --- docs/source/developers/guide/step_by_step/arrow_codebase.rst | 4 ++-- docs/source/developers/java/building.rst | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/developers/guide/step_by_step/arrow_codebase.rst b/docs/source/developers/guide/step_by_step/arrow_codebase.rst index 8f84f2b381b..ddd4266dbd3 100644 --- a/docs/source/developers/guide/step_by_step/arrow_codebase.rst +++ b/docs/source/developers/guide/step_by_step/arrow_codebase.rst @@ -32,8 +32,8 @@ Working on the Arrow codebase 🧐 Finding your way around Arrow ============================= -The `Apache Arrow repository `_ includes implementations for -most of the libraries for which Arrow is available. +The `Apache Arrow repository `_ includes +implementations for most of the libraries for which Arrow is available. Languages like GLib (``c_glib/``), C++ (``cpp/``), C# (``csharp/``), JavaScript (``js/``), MATLAB (``matlab/``), Python (``python/``), R (``r/``) diff --git a/docs/source/developers/java/building.rst b/docs/source/developers/java/building.rst index 15420b78ff0..fb3d9a62ed1 100644 --- a/docs/source/developers/java/building.rst +++ b/docs/source/developers/java/building.rst @@ -47,7 +47,7 @@ repository: .. code-block:: $ git clone https://github.com/apache/arrow-java.git - $ cd arrow + $ cd arrow-java $ git submodule update --init --recursive These are the options available to compile Arrow Java modules with: From 7a258bdf84418db9fabe4e778d900c59bbaf50c4 Mon Sep 17 00:00:00 2001 From: parthchonkar Date: Wed, 1 Jan 2025 18:30:27 -0600 Subject: [PATCH 3/4] Adress review comments + tweak build instructions --- docs/source/developers/java/building.rst | 61 +++++++++++++----------- 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/docs/source/developers/java/building.rst b/docs/source/developers/java/building.rst index fb3d9a62ed1..3b3f3524d3a 100644 --- a/docs/source/developers/java/building.rst +++ b/docs/source/developers/java/building.rst @@ -66,7 +66,7 @@ Maven .. code-block:: - $ cd arrow/java + $ cd arrow-java $ export JAVA_HOME= $ java --version $ mvn clean install @@ -76,7 +76,7 @@ Docker compose .. code-block:: - $ cd arrow/java + $ cd arrow-java $ export JAVA_HOME= $ java --version $ docker compose run java @@ -86,7 +86,7 @@ Archery .. code-block:: - $ cd arrow/java + $ cd arrow-java $ export JAVA_HOME= $ java --version $ archery docker run java @@ -111,7 +111,7 @@ Maven .. code-block:: text - $ cd arrow/java + $ cd arrow-java $ export JAVA_HOME= $ java --version $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N @@ -122,7 +122,7 @@ Maven .. code-block:: - $ cd arrow/java + $ cd arrow-java $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ dir "../java-dist/bin" |__ arrow_cdata_jni/ @@ -131,7 +131,7 @@ Maven .. code-block:: text - $ cd arrow/java + $ cd arrow-java $ export JAVA_HOME= $ java --version $ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N @@ -144,7 +144,7 @@ Maven .. code-block:: - $ cd arrow/java + $ cd arrow-java $ mvn generate-resources -Pgenerate-libs-jni-windows -N $ dir "../java-dist/bin" |__ arrow_dataset_jni/ @@ -156,10 +156,9 @@ CMake .. code-block:: text - $ cd arrow + $ cd arrow-java $ mkdir -p java-dist java-cdata $ cmake \ - -S java \ -B java-cdata \ -DARROW_JAVA_JNI_ENABLE_C=ON \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \ @@ -174,10 +173,9 @@ CMake .. code-block:: - $ cd arrow + $ cd arrow-java $ mkdir java-dist, java-cdata $ cmake ^ - -S java ^ -B java-cdata ^ -DARROW_JAVA_JNI_ENABLE_C=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^ @@ -192,14 +190,19 @@ CMake .. code-block:: - $ cd arrow + $ git clone --recurse-submodules https://github.com/apache/arrow.git + $ git clone --recurse-submodules https://github.com/apache/arrow-java.git + # Both arrow and arrow-java repos must be present + $ ARROW_PATH=$PWD/arrow + $ ARROW_JAVA_PATH=$PWD/arrow-java + $ mkdir -p $ARROW_JAVA_PATH/java-dist $ARROW_PATH/cpp-jni + $ cd $ARROW_PATH $ brew bundle --file=cpp/Brewfile # Homebrew Bundle complete! 25 Brewfile dependencies now installed. $ brew uninstall aws-sdk-cpp # (We can't use aws-sdk-cpp installed by Homebrew because it has # an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 ) $ export JAVA_HOME= - $ mkdir -p java-dist cpp-jni $ cmake \ -S cpp \ -B cpp-jni \ @@ -218,11 +221,12 @@ CMake -DARROW_SUBSTRAIT=ON \ -DARROW_USE_CCACHE=ON \ -DCMAKE_BUILD_TYPE=Release \ - -DCMAKE_INSTALL_PREFIX=java-dist \ + -DCMAKE_INSTALL_PREFIX=$ARROW_JAVA_PATH/java-dist \ -DCMAKE_UNITY_BUILD=ON + # Install artifacts to java-dist/ in arrow-java $ cmake --build cpp-jni --target install --config Release + $ cd $ARROW_JAVA_PATH $ cmake \ - -S java \ -B java-jni \ -DARROW_JAVA_JNI_ENABLE_C=OFF \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \ @@ -230,7 +234,7 @@ CMake -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_PREFIX_PATH=$PWD/java-dist \ - -DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \ + -DProtobuf_ROOT=$ARROW_PATH/cpp-jni/protobuf_ep-install \ -DProtobuf_USE_STATIC_LIBS=ON $ cmake --build java-jni --target install --config Release $ ls -latr java-dist/lib/ @@ -242,8 +246,12 @@ CMake .. code-block:: - $ cd arrow - $ mkdir java-dist, cpp-jni + $ git clone --recurse-submodules https://github.com/apache/arrow.git + $ git clone --recurse-submodules https://github.com/apache/arrow-java.git + $ ARROW_PATH=$PWD/arrow + $ ARROW_JAVA_PATH=$PWD/arrow-java + $ mkdir $ARROW_JAVA_PATH/java-dist $ARROW_PATH/cpp-jni + $ cd $ARROW_PATH $ cmake ^ -S cpp ^ -B cpp-jni ^ @@ -265,14 +273,13 @@ CMake -DARROW_WITH_ZLIB=ON ^ -DARROW_WITH_ZSTD=ON ^ -DCMAKE_BUILD_TYPE=Release ^ - -DCMAKE_INSTALL_PREFIX=java-dist ^ + -DCMAKE_INSTALL_PREFIX=$ARROW_JAVA_PATH/java-dist ^ -DCMAKE_UNITY_BUILD=ON ^ -GNinja $ cd cpp-jni $ ninja install - $ cd ../ + $ cd $ARROW_JAVA_PATH $ cmake ^ - -S java ^ -B java-jni ^ -DARROW_JAVA_JNI_ENABLE_C=OFF ^ -DARROW_JAVA_JNI_ENABLE_DATASET=ON ^ @@ -293,7 +300,7 @@ Archery .. code-block:: text - $ cd arrow + $ cd arrow-java $ archery docker run java-jni-manylinux-2014 $ ls -latr java-dist |__ arrow_cdata_jni/ @@ -308,14 +315,14 @@ Building Java JNI Modules .. code-block:: - $ cd arrow/java + $ cd arrow-java $ mvn -Darrow.c.jni.dist.dir=/java-dist/lib -Parrow-c-data clean install - To compile the JNI bindings for ORC / Gandiva / Dataset, use the ``arrow-jni`` Maven profile: .. code-block:: - $ cd arrow/java + $ cd arrow-java $ mvn \ -Darrow.cpp.build.dir=/java-dist/lib/ \ -Darrow.c.jni.dist.dir=/java-dist/lib/ \ @@ -366,7 +373,7 @@ For example, to run Arrow tests with JDK 17, use the following snippet: .. code-block:: - $ cd arrow/java + $ cd arrow-java $ mvn -Darrow.test.jdk-version=17 clean verify IDE Configuration @@ -376,8 +383,8 @@ IntelliJ -------- To start working on Arrow in IntelliJ: build the project once from the command -line using ``mvn clean install``. Then open the ``java/`` subdirectory of the -Arrow repository, and update the following settings: +line using ``mvn clean install``. Then open the project root of the arrow-java repository, +and update the following settings: * In the Files tool window, find the path ``vector/target/generated-sources``, right click the directory, and select Mark Directory as > Generated Sources From 26a6cf832c4c1356020c711effe9dfe4cfaa0d96 Mon Sep 17 00:00:00 2001 From: parthchonkar Date: Wed, 29 Jan 2025 20:58:33 -0500 Subject: [PATCH 4/4] Checkout building.rst back to main --- docs/source/developers/java/building.rst | 65 +++++++++++------------- 1 file changed, 29 insertions(+), 36 deletions(-) diff --git a/docs/source/developers/java/building.rst b/docs/source/developers/java/building.rst index 3b3f3524d3a..372d44045f0 100644 --- a/docs/source/developers/java/building.rst +++ b/docs/source/developers/java/building.rst @@ -46,8 +46,8 @@ repository: .. code-block:: - $ git clone https://github.com/apache/arrow-java.git - $ cd arrow-java + $ git clone https://github.com/apache/arrow.git + $ cd arrow $ git submodule update --init --recursive These are the options available to compile Arrow Java modules with: @@ -66,7 +66,7 @@ Maven .. code-block:: - $ cd arrow-java + $ cd arrow/java $ export JAVA_HOME= $ java --version $ mvn clean install @@ -76,7 +76,7 @@ Docker compose .. code-block:: - $ cd arrow-java + $ cd arrow/java $ export JAVA_HOME= $ java --version $ docker compose run java @@ -86,7 +86,7 @@ Archery .. code-block:: - $ cd arrow-java + $ cd arrow/java $ export JAVA_HOME= $ java --version $ archery docker run java @@ -111,7 +111,7 @@ Maven .. code-block:: text - $ cd arrow-java + $ cd arrow/java $ export JAVA_HOME= $ java --version $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N @@ -122,7 +122,7 @@ Maven .. code-block:: - $ cd arrow-java + $ cd arrow/java $ mvn generate-resources -Pgenerate-libs-cdata-all-os -N $ dir "../java-dist/bin" |__ arrow_cdata_jni/ @@ -131,7 +131,7 @@ Maven .. code-block:: text - $ cd arrow-java + $ cd arrow/java $ export JAVA_HOME= $ java --version $ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N @@ -144,7 +144,7 @@ Maven .. code-block:: - $ cd arrow-java + $ cd arrow/java $ mvn generate-resources -Pgenerate-libs-jni-windows -N $ dir "../java-dist/bin" |__ arrow_dataset_jni/ @@ -156,9 +156,10 @@ CMake .. code-block:: text - $ cd arrow-java + $ cd arrow $ mkdir -p java-dist java-cdata $ cmake \ + -S java \ -B java-cdata \ -DARROW_JAVA_JNI_ENABLE_C=ON \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \ @@ -173,9 +174,10 @@ CMake .. code-block:: - $ cd arrow-java + $ cd arrow $ mkdir java-dist, java-cdata $ cmake ^ + -S java ^ -B java-cdata ^ -DARROW_JAVA_JNI_ENABLE_C=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^ @@ -190,19 +192,14 @@ CMake .. code-block:: - $ git clone --recurse-submodules https://github.com/apache/arrow.git - $ git clone --recurse-submodules https://github.com/apache/arrow-java.git - # Both arrow and arrow-java repos must be present - $ ARROW_PATH=$PWD/arrow - $ ARROW_JAVA_PATH=$PWD/arrow-java - $ mkdir -p $ARROW_JAVA_PATH/java-dist $ARROW_PATH/cpp-jni - $ cd $ARROW_PATH + $ cd arrow $ brew bundle --file=cpp/Brewfile # Homebrew Bundle complete! 25 Brewfile dependencies now installed. $ brew uninstall aws-sdk-cpp # (We can't use aws-sdk-cpp installed by Homebrew because it has # an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 ) $ export JAVA_HOME= + $ mkdir -p java-dist cpp-jni $ cmake \ -S cpp \ -B cpp-jni \ @@ -221,12 +218,11 @@ CMake -DARROW_SUBSTRAIT=ON \ -DARROW_USE_CCACHE=ON \ -DCMAKE_BUILD_TYPE=Release \ - -DCMAKE_INSTALL_PREFIX=$ARROW_JAVA_PATH/java-dist \ + -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_UNITY_BUILD=ON - # Install artifacts to java-dist/ in arrow-java $ cmake --build cpp-jni --target install --config Release - $ cd $ARROW_JAVA_PATH $ cmake \ + -S java \ -B java-jni \ -DARROW_JAVA_JNI_ENABLE_C=OFF \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \ @@ -234,7 +230,7 @@ CMake -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_PREFIX_PATH=$PWD/java-dist \ - -DProtobuf_ROOT=$ARROW_PATH/cpp-jni/protobuf_ep-install \ + -DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \ -DProtobuf_USE_STATIC_LIBS=ON $ cmake --build java-jni --target install --config Release $ ls -latr java-dist/lib/ @@ -246,12 +242,8 @@ CMake .. code-block:: - $ git clone --recurse-submodules https://github.com/apache/arrow.git - $ git clone --recurse-submodules https://github.com/apache/arrow-java.git - $ ARROW_PATH=$PWD/arrow - $ ARROW_JAVA_PATH=$PWD/arrow-java - $ mkdir $ARROW_JAVA_PATH/java-dist $ARROW_PATH/cpp-jni - $ cd $ARROW_PATH + $ cd arrow + $ mkdir java-dist, cpp-jni $ cmake ^ -S cpp ^ -B cpp-jni ^ @@ -273,13 +265,14 @@ CMake -DARROW_WITH_ZLIB=ON ^ -DARROW_WITH_ZSTD=ON ^ -DCMAKE_BUILD_TYPE=Release ^ - -DCMAKE_INSTALL_PREFIX=$ARROW_JAVA_PATH/java-dist ^ + -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_UNITY_BUILD=ON ^ -GNinja $ cd cpp-jni $ ninja install - $ cd $ARROW_JAVA_PATH + $ cd ../ $ cmake ^ + -S java ^ -B java-jni ^ -DARROW_JAVA_JNI_ENABLE_C=OFF ^ -DARROW_JAVA_JNI_ENABLE_DATASET=ON ^ @@ -300,7 +293,7 @@ Archery .. code-block:: text - $ cd arrow-java + $ cd arrow $ archery docker run java-jni-manylinux-2014 $ ls -latr java-dist |__ arrow_cdata_jni/ @@ -315,14 +308,14 @@ Building Java JNI Modules .. code-block:: - $ cd arrow-java + $ cd arrow/java $ mvn -Darrow.c.jni.dist.dir=/java-dist/lib -Parrow-c-data clean install - To compile the JNI bindings for ORC / Gandiva / Dataset, use the ``arrow-jni`` Maven profile: .. code-block:: - $ cd arrow-java + $ cd arrow/java $ mvn \ -Darrow.cpp.build.dir=/java-dist/lib/ \ -Darrow.c.jni.dist.dir=/java-dist/lib/ \ @@ -373,7 +366,7 @@ For example, to run Arrow tests with JDK 17, use the following snippet: .. code-block:: - $ cd arrow-java + $ cd arrow/java $ mvn -Darrow.test.jdk-version=17 clean verify IDE Configuration @@ -383,8 +376,8 @@ IntelliJ -------- To start working on Arrow in IntelliJ: build the project once from the command -line using ``mvn clean install``. Then open the project root of the arrow-java repository, -and update the following settings: +line using ``mvn clean install``. Then open the ``java/`` subdirectory of the +Arrow repository, and update the following settings: * In the Files tool window, find the path ``vector/target/generated-sources``, right click the directory, and select Mark Directory as > Generated Sources