Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 58 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,77 @@
# cloud-cli

`cloud-cli` is a command-line tool designed to simplify the management and diagnosis of server-side applications. It provides an interactive menu to access various diagnostic tools, making it easier for developers and system administrators to troubleshoot processes.
`cloud-cli` is an interactive command-line tool for on-call troubleshooting of Apache Doris / SelectDB clusters. It provides a TUI menu that groups common FE/BE diagnostic workflows and writes collected artifacts into an output directory for easy archiving and sharing.

## Features

The tool is organized into two main categories:
The tool is organized into two categories: `FE` and `BE`.

### FE (Frontend/Java Applications)
### FE

- **`jstack`**: Prints Java thread stack traces for a given Java process, helping to diagnose hangs and deadlocks.
- **`jmap`**: Generates heap dumps and provides memory statistics for a Java process, useful for analyzing memory leaks.
- `fe-list`: List and select the FE target host (IP) for the current session based on `clusters.toml`.
- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram.
- `jstack`: Java thread dump.
- `fe-profiler`: Generate a flame graph using Doris `bin/profile_fe.sh` (requires async-profiler).
- `table-info`: Interactive database/table browser that collects and summarizes schema, indexes, partitions, and bucket details (supports exporting `.txt` reports).
- `routine-load`: Routine Load helper tools:
- `Get Job ID`: List and select a Routine Load job (cached locally for later analysis).
- `Performance Analysis`: Analyze per-commit rows/bytes/time from FE logs.
- `Traffic Monitor`: Aggregate per-minute `loadedRows` from FE logs.
- `fe-audit-topsql`: Parse `fe.audit.log`, normalize SQL into templates, and aggregate by CPU/time to generate a TopSQL report. Default filter is `count > 10`; if nothing matches, it falls back to showing results without the count filter.

### BE (Backend/General Processes)
### BE

- **`pstack`**: Displays the stack trace for any running process, offering insights into its execution state.
- **`get_be_vars`**: Retrieves and displays the environment variables of a running process.
- `be-list`: List and select the BE target host (IP) for the current session based on `clusters.toml` (defaults to `127.0.0.1`).
- `pstack`: Use `gdb` to dump process thread stacks (writes a `.txt` file).
- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram (only meaningful for JVM processes).
- `be-config`:
- `get-vars` (`get-be-vars`): Query BE configuration variables via HTTP `/varz`.
- `update-config` (`set-be-config`): Update configuration via HTTP `/api/update_config` (supports persist).
- `pipeline-tasks`: Fetch running pipeline tasks via HTTP `/api/running_pipeline_tasks` (auto-saves when the response is large).
- `memz`:
- `current` (`memz`): Fetch the current Jemalloc memory view via HTTP `/memz` (saves HTML and prints a summary).
- `global` (`memz-global`): Fetch the global memory view via HTTP `/memz?type=global`.

## Usage

To run the application, execute the binary. An interactive menu will appear, allowing you to select the desired diagnostic tool.
### Build and run

```sh
./cloud-cli
cargo build --release
./target/release/cloud-cli
```

### Configuration

- Persistent config: `~/.config/cloud-cli/config.toml`
- MySQL key file: `~/.config/cloud-cli/key`
- Cluster topology cache: `~/.config/cloud-cli/clusters.toml`
- First run: if an FE process is detected and MySQL credentials are missing, it will prompt you to configure and test the connection. On success, it writes `config.toml` and fetches cluster topology into `clusters.toml` (used by `fe-list`/`be-list`).
- Common environment variables (override persistent config):
- `JDK_PATH`: Set the JDK path (used by `jmap`/`jstack`).
- `OUTPUT_DIR`: Set the output directory (default: `/tmp/doris/collection`).
- `CLOUD_CLI_TIMEOUT`: External command timeout in seconds (default: `60`).
- `CLOUD_CLI_NO_PROGRESS`: Disable progress animation (`1`/`true`).
- `MYSQL_HOST` / `MYSQL_PORT`: Set Doris MySQL endpoint (defaults are derived from config; otherwise `127.0.0.1:9030`).
- `PROFILE_SECONDS`: Override `fe-profiler` collection duration in seconds.
- `CLOUD_CLI_FE_AUDIT_TOPSQL_INPUT`: Provide an explicit `fe.audit.log` path for `fe-audit-topsql` (skips interactive selection).

### Output

- Most tools write artifacts into the output directory (default: `/tmp/doris/collection`) and print `Output saved to: ...`.
- A few tools primarily print to stdout (e.g., `get-be-vars`), but still follow consistent prompts and error messages.

### Runtime dependencies

The CLI invokes some system commands at runtime. Ensure these are installed and available in `PATH`:

- `mysql` (for cluster info, Routine Load, Table Info, etc.)
- `curl` (for BE HTTP tools)
- `gdb` (for `pstack`)
- `bash`

## Releases

This project uses GitHub Actions to automatically build and release binaries for Linux (`x86_64` and `aarch64`). When a new version is tagged (e.g., `v1.0.0`), a new release is created.
This project uses GitHub Actions to build and publish Linux (`x86_64` / `aarch64`) binaries. When you create a tag (e.g., `v1.0.0`), a corresponding GitHub Release is produced.

You can download the latest pre-compiled binaries from the [GitHub Releases](https://github.com/QuakeWang/cloud-cli/releases) page.
You can download the latest prebuilt binaries from GitHub Releases: https://github.com/QuakeWang/cloud-cli/releases
6 changes: 0 additions & 6 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ pub mod process;
pub mod tools;
pub mod ui;

use config::Config;
use config_loader::persist_configuration;
use dialoguer::Confirm;
use error::Result;
use tools::Tool;
use tools::mysql::CredentialManager;
use ui::*;

Expand Down Expand Up @@ -98,7 +96,3 @@ pub fn run_cli() -> Result<()> {
ui::print_goodbye();
Ok(())
}

fn execute_tool_enhanced(config: &Config, tool: &dyn Tool, service_name: &str) -> Result<()> {
ui::tool_executor::execute_tool_enhanced(config, tool, service_name)
}
4 changes: 4 additions & 0 deletions src/tools/common/fs_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ pub fn collect_fe_logs(dir: &Path) -> Result<Vec<PathBuf>> {
collect_log_files(dir, "fe.log")
}

pub fn collect_fe_audit_logs(dir: &Path) -> Result<Vec<PathBuf>> {
collect_log_files(dir, "fe.audit.log")
}

pub fn collect_be_logs(dir: &Path) -> Result<Vec<PathBuf>> {
collect_log_files(dir, "be.INFO")
}
144 changes: 144 additions & 0 deletions src/tools/fe/audit_topsql/aggregate.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
use std::collections::HashMap;

#[derive(Debug, Clone)]
pub struct TemplateStats {
pub sql_template: String,
pub table: Option<String>,
pub slowest_stmt: String,
pub slowest_query_id: Option<String>,
pub slowest_time_ms: u64,
pub count: u64,
pub total_time_ms: u64,
pub total_cpu_ms: u64,
pub max_time_ms: u64,
pub min_time_ms: u64,
}

impl TemplateStats {
pub fn avg_time_ms(&self) -> f64 {
if self.count == 0 {
return 0.0;
}
self.total_time_ms as f64 / self.count as f64
}

pub fn avg_cpu_ms(&self) -> f64 {
if self.count == 0 {
return 0.0;
}
self.total_cpu_ms as f64 / self.count as f64
}
}

#[derive(Debug, Clone)]
pub struct AnalysisResult {
pub items: Vec<TemplateStats>,
pub total_templates: u64,
pub total_executions: u64,
pub total_time_ms: u64,
pub total_cpu_ms: u64,
pub used_fallback: bool,
}

#[derive(Debug, Default)]
struct TemplateAgg {
table: Option<String>,
slowest_stmt: String,
slowest_query_id: Option<String>,
slowest_time_ms: u64,
count: u64,
total_time_ms: u64,
total_cpu_ms: u64,
max_time_ms: u64,
min_time_ms: u64,
}

pub struct TemplateAggregator {
map: HashMap<String, TemplateAgg>,
}

impl TemplateAggregator {
pub fn new() -> Self {
Self {
map: HashMap::new(),
}
}

pub fn push(
&mut self,
template: String,
table: Option<String>,
time_ms: u64,
cpu_ms: u64,
stmt: String,
query_id: Option<String>,
) {
let entry = self.map.entry(template).or_insert_with(|| TemplateAgg {
min_time_ms: u64::MAX,
..Default::default()
});

if entry.table.is_none() {
entry.table = table;
}

entry.count += 1;
entry.total_time_ms += time_ms;
entry.total_cpu_ms += cpu_ms;
entry.max_time_ms = entry.max_time_ms.max(time_ms);
entry.min_time_ms = entry.min_time_ms.min(time_ms);

if time_ms > entry.slowest_time_ms {
entry.slowest_time_ms = time_ms;
entry.slowest_stmt = stmt;
entry.slowest_query_id = query_id;
}
}

pub fn finish(self, min_count_exclusive: u64) -> AnalysisResult {
let has_threshold_matches = self.map.values().any(|x| x.count > min_count_exclusive);

let mut items: Vec<TemplateStats> = self
.map
.into_iter()
.filter(|(_, x)| !has_threshold_matches || x.count > min_count_exclusive)
.map(|(sql_template, x)| TemplateStats {
sql_template,
table: x.table,
slowest_stmt: x.slowest_stmt,
slowest_query_id: x.slowest_query_id,
slowest_time_ms: x.slowest_time_ms,
count: x.count,
total_time_ms: x.total_time_ms,
total_cpu_ms: x.total_cpu_ms,
max_time_ms: x.max_time_ms,
min_time_ms: if x.min_time_ms == u64::MAX {
0
} else {
x.min_time_ms
},
})
.collect();

items.sort_by_key(|x| std::cmp::Reverse(x.total_cpu_ms));

let total_templates = items.len() as u64;
let (total_executions, total_time_ms, total_cpu_ms) =
items.iter().fold((0u64, 0u64, 0u64), |acc, it| {
(
acc.0 + it.count,
acc.1 + it.total_time_ms,
acc.2 + it.total_cpu_ms,
)
});

AnalysisResult {
used_fallback: !has_threshold_matches && !items.is_empty(),
items,
total_templates,
total_executions,
total_time_ms,
total_cpu_ms,
}
}
}
7 changes: 7 additions & 0 deletions src/tools/fe/audit_topsql/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
mod aggregate;
mod normalize;
mod parser;
mod report;
mod tool;

pub use tool::FeAuditTopSqlTool;
74 changes: 74 additions & 0 deletions src/tools/fe/audit_topsql/normalize.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
use once_cell::sync::Lazy;
use regex::Regex;
use std::collections::HashSet;

static RE_LITERALS: Lazy<Regex> = Lazy::new(|| Regex::new(r"'[^']*'|\b\d+(?:\.\d+)?\b").unwrap());
static RE_IN_LIST: Lazy<Regex> =
Lazy::new(|| Regex::new(r"(?i)\bin\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
static RE_NOT_IN_LIST: Lazy<Regex> =
Lazy::new(|| Regex::new(r"(?i)\bnot\s+in\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
static RE_MULTI_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s+").unwrap());
static RE_OPERATOR_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s*([=(),])\s*").unwrap());
static RE_LINE_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?m)--[^\n]*").unwrap());
static RE_BLOCK_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?s)/\*.*?\*/").unwrap());
static RE_CTE: Lazy<Regex> =
Lazy::new(|| Regex::new(r"(?i)(?:^|\bwith\b|,)\s*([a-z0-9_]+)\s+as\s*\(").unwrap());
static RE_FROM: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?i)\bfrom\s+([a-z0-9_.`]+)").unwrap());

pub fn normalize_sql(sql: &str) -> String {
if sql.is_empty() {
return String::new();
}

let mut out = RE_BLOCK_COMMENT.replace_all(sql, " ").to_string();
if out.contains('\n') {
out = RE_LINE_COMMENT.replace_all(&out, " ").to_string();
}
out = out.replace(['\r', '\n', '\t'], " ");

out = RE_LITERALS.replace_all(&out, "?").to_string();
out = RE_NOT_IN_LIST.replace_all(&out, "not in (?)").to_string();
out = RE_IN_LIST.replace_all(&out, "in (?)").to_string();
out = RE_OPERATOR_SPACE.replace_all(&out, "$1").to_string();
out.make_ascii_lowercase();
out = RE_MULTI_SPACE.replace_all(&out, " ").to_string();
out.trim().to_string()
}

pub fn guess_table(normalized_sql: &str) -> Option<String> {
if normalized_sql.is_empty() {
return None;
}

let mut ctes: HashSet<String> = HashSet::new();
for cap in RE_CTE.captures_iter(normalized_sql) {
if let Some(name) = cap.get(1) {
ctes.insert(name.as_str().to_ascii_lowercase());
}
}

for cap in RE_FROM.captures_iter(normalized_sql) {
let Some(m) = cap.get(1) else { continue };
let table = m.as_str().replace('`', "");
let table_lc = table.to_ascii_lowercase();
if ctes.contains(&table_lc) {
continue;
}
if matches!(
table_lc.as_str(),
"a" | "b"
| "c"
| "d"
| "t"
| "t_index"
| "params"
| "current_data"
| "last_period_data"
) {
continue;
}
return Some(table);
}

None
}
Loading