QuakeWang · QuakeWang · Dec 30, 2025 · Dec 30, 2025
diff --git a/README.md b/README.md
@@ -1,31 +1,77 @@
 # cloud-cli
 
-`cloud-cli` is a command-line tool designed to simplify the management and diagnosis of server-side applications. It provides an interactive menu to access various diagnostic tools, making it easier for developers and system administrators to troubleshoot processes.
+`cloud-cli` is an interactive command-line tool for on-call troubleshooting of Apache Doris / SelectDB clusters. It provides a TUI menu that groups common FE/BE diagnostic workflows and writes collected artifacts into an output directory for easy archiving and sharing.
 
 ## Features
 
-The tool is organized into two main categories:
+The tool is organized into two categories: `FE` and `BE`.
 
-### FE (Frontend/Java Applications)
+### FE
 
-- **`jstack`**: Prints Java thread stack traces for a given Java process, helping to diagnose hangs and deadlocks.
-- **`jmap`**: Generates heap dumps and provides memory statistics for a Java process, useful for analyzing memory leaks.
+- `fe-list`: List and select the FE target host (IP) for the current session based on `clusters.toml`.
+- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram.
+- `jstack`: Java thread dump.
+- `fe-profiler`: Generate a flame graph using Doris `bin/profile_fe.sh` (requires async-profiler).
+- `table-info`: Interactive database/table browser that collects and summarizes schema, indexes, partitions, and bucket details (supports exporting `.txt` reports).
+- `routine-load`: Routine Load helper tools:
+  - `Get Job ID`: List and select a Routine Load job (cached locally for later analysis).
+  - `Performance Analysis`: Analyze per-commit rows/bytes/time from FE logs.
+  - `Traffic Monitor`: Aggregate per-minute `loadedRows` from FE logs.
+- `fe-audit-topsql`: Parse `fe.audit.log`, normalize SQL into templates, and aggregate by CPU/time to generate a TopSQL report. Default filter is `count > 10`; if nothing matches, it falls back to showing results without the count filter.
 
-### BE (Backend/General Processes)
+### BE
 
-- **`pstack`**: Displays the stack trace for any running process, offering insights into its execution state.
-- **`get_be_vars`**: Retrieves and displays the environment variables of a running process.
+- `be-list`: List and select the BE target host (IP) for the current session based on `clusters.toml` (defaults to `127.0.0.1`).
+- `pstack`: Use `gdb` to dump process thread stacks (writes a `.txt` file).
+- `jmap` (`jmap-dump`, `jmap-histo`): Java heap dump / histogram (only meaningful for JVM processes).
+- `be-config`:
+  - `get-vars` (`get-be-vars`): Query BE configuration variables via HTTP `/varz`.
+  - `update-config` (`set-be-config`): Update configuration via HTTP `/api/update_config` (supports persist).
+- `pipeline-tasks`: Fetch running pipeline tasks via HTTP `/api/running_pipeline_tasks` (auto-saves when the response is large).
+- `memz`:
+  - `current` (`memz`): Fetch the current Jemalloc memory view via HTTP `/memz` (saves HTML and prints a summary).
+  - `global` (`memz-global`): Fetch the global memory view via HTTP `/memz?type=global`.
 
 ## Usage
 
-To run the application, execute the binary. An interactive menu will appear, allowing you to select the desired diagnostic tool.
+### Build and run
 
 ```sh
-./cloud-cli
+cargo build --release
+./target/release/cloud-cli
 ```
 
+### Configuration
+
+- Persistent config: `~/.config/cloud-cli/config.toml`
+- MySQL key file: `~/.config/cloud-cli/key`
+- Cluster topology cache: `~/.config/cloud-cli/clusters.toml`
+- First run: if an FE process is detected and MySQL credentials are missing, it will prompt you to configure and test the connection. On success, it writes `config.toml` and fetches cluster topology into `clusters.toml` (used by `fe-list`/`be-list`).
+- Common environment variables (override persistent config):
+  - `JDK_PATH`: Set the JDK path (used by `jmap`/`jstack`).
+  - `OUTPUT_DIR`: Set the output directory (default: `/tmp/doris/collection`).
+  - `CLOUD_CLI_TIMEOUT`: External command timeout in seconds (default: `60`).
+  - `CLOUD_CLI_NO_PROGRESS`: Disable progress animation (`1`/`true`).
+  - `MYSQL_HOST` / `MYSQL_PORT`: Set Doris MySQL endpoint (defaults are derived from config; otherwise `127.0.0.1:9030`).
+  - `PROFILE_SECONDS`: Override `fe-profiler` collection duration in seconds.
+  - `CLOUD_CLI_FE_AUDIT_TOPSQL_INPUT`: Provide an explicit `fe.audit.log` path for `fe-audit-topsql` (skips interactive selection).
+
+### Output
+
+- Most tools write artifacts into the output directory (default: `/tmp/doris/collection`) and print `Output saved to: ...`.
+- A few tools primarily print to stdout (e.g., `get-be-vars`), but still follow consistent prompts and error messages.
+
+### Runtime dependencies
+
+The CLI invokes some system commands at runtime. Ensure these are installed and available in `PATH`:
+
+- `mysql` (for cluster info, Routine Load, Table Info, etc.)
+- `curl` (for BE HTTP tools)
+- `gdb` (for `pstack`)
+- `bash`
+
 ## Releases
 
-This project uses GitHub Actions to automatically build and release binaries for Linux (`x86_64` and `aarch64`). When a new version is tagged (e.g., `v1.0.0`), a new release is created.
+This project uses GitHub Actions to build and publish Linux (`x86_64` / `aarch64`) binaries. When you create a tag (e.g., `v1.0.0`), a corresponding GitHub Release is produced.
 
-You can download the latest pre-compiled binaries from the [GitHub Releases](https://github.com/QuakeWang/cloud-cli/releases) page.
+You can download the latest prebuilt binaries from GitHub Releases: https://github.com/QuakeWang/cloud-cli/releases
diff --git a/src/lib.rs b/src/lib.rs
@@ -7,11 +7,9 @@ pub mod process;
 pub mod tools;
 pub mod ui;
 
-use config::Config;
 use config_loader::persist_configuration;
 use dialoguer::Confirm;
 use error::Result;
-use tools::Tool;
 use tools::mysql::CredentialManager;
 use ui::*;
 
@@ -98,7 +96,3 @@ pub fn run_cli() -> Result<()> {
     ui::print_goodbye();
     Ok(())
 }
-
-fn execute_tool_enhanced(config: &Config, tool: &dyn Tool, service_name: &str) -> Result<()> {
-    ui::tool_executor::execute_tool_enhanced(config, tool, service_name)
-}
diff --git a/src/tools/common/fs_utils.rs b/src/tools/common/fs_utils.rs
@@ -92,6 +92,10 @@ pub fn collect_fe_logs(dir: &Path) -> Result<Vec<PathBuf>> {
     collect_log_files(dir, "fe.log")
 }
 
+pub fn collect_fe_audit_logs(dir: &Path) -> Result<Vec<PathBuf>> {
+    collect_log_files(dir, "fe.audit.log")
+}
+
 pub fn collect_be_logs(dir: &Path) -> Result<Vec<PathBuf>> {
     collect_log_files(dir, "be.INFO")
 }
diff --git a/src/tools/fe/audit_topsql/aggregate.rs b/src/tools/fe/audit_topsql/aggregate.rs
@@ -0,0 +1,144 @@
+use std::collections::HashMap;
+
+#[derive(Debug, Clone)]
+pub struct TemplateStats {
+    pub sql_template: String,
+    pub table: Option<String>,
+    pub slowest_stmt: String,
+    pub slowest_query_id: Option<String>,
+    pub slowest_time_ms: u64,
+    pub count: u64,
+    pub total_time_ms: u64,
+    pub total_cpu_ms: u64,
+    pub max_time_ms: u64,
+    pub min_time_ms: u64,
+}
+
+impl TemplateStats {
+    pub fn avg_time_ms(&self) -> f64 {
+        if self.count == 0 {
+            return 0.0;
+        }
+        self.total_time_ms as f64 / self.count as f64
+    }
+
+    pub fn avg_cpu_ms(&self) -> f64 {
+        if self.count == 0 {
+            return 0.0;
+        }
+        self.total_cpu_ms as f64 / self.count as f64
+    }
+}
+
+#[derive(Debug, Clone)]
+pub struct AnalysisResult {
+    pub items: Vec<TemplateStats>,
+    pub total_templates: u64,
+    pub total_executions: u64,
+    pub total_time_ms: u64,
+    pub total_cpu_ms: u64,
+    pub used_fallback: bool,
+}
+
+#[derive(Debug, Default)]
+struct TemplateAgg {
+    table: Option<String>,
+    slowest_stmt: String,
+    slowest_query_id: Option<String>,
+    slowest_time_ms: u64,
+    count: u64,
+    total_time_ms: u64,
+    total_cpu_ms: u64,
+    max_time_ms: u64,
+    min_time_ms: u64,
+}
+
+pub struct TemplateAggregator {
+    map: HashMap<String, TemplateAgg>,
+}
+
+impl TemplateAggregator {
+    pub fn new() -> Self {
+        Self {
+            map: HashMap::new(),
+        }
+    }
+
+    pub fn push(
+        &mut self,
+        template: String,
+        table: Option<String>,
+        time_ms: u64,
+        cpu_ms: u64,
+        stmt: String,
+        query_id: Option<String>,
+    ) {
+        let entry = self.map.entry(template).or_insert_with(|| TemplateAgg {
+            min_time_ms: u64::MAX,
+            ..Default::default()
+        });
+
+        if entry.table.is_none() {
+            entry.table = table;
+        }
+
+        entry.count += 1;
+        entry.total_time_ms += time_ms;
+        entry.total_cpu_ms += cpu_ms;
+        entry.max_time_ms = entry.max_time_ms.max(time_ms);
+        entry.min_time_ms = entry.min_time_ms.min(time_ms);
+
+        if time_ms > entry.slowest_time_ms {
+            entry.slowest_time_ms = time_ms;
+            entry.slowest_stmt = stmt;
+            entry.slowest_query_id = query_id;
+        }
+    }
+
+    pub fn finish(self, min_count_exclusive: u64) -> AnalysisResult {
+        let has_threshold_matches = self.map.values().any(|x| x.count > min_count_exclusive);
+
+        let mut items: Vec<TemplateStats> = self
+            .map
+            .into_iter()
+            .filter(|(_, x)| !has_threshold_matches || x.count > min_count_exclusive)
+            .map(|(sql_template, x)| TemplateStats {
+                sql_template,
+                table: x.table,
+                slowest_stmt: x.slowest_stmt,
+                slowest_query_id: x.slowest_query_id,
+                slowest_time_ms: x.slowest_time_ms,
+                count: x.count,
+                total_time_ms: x.total_time_ms,
+                total_cpu_ms: x.total_cpu_ms,
+                max_time_ms: x.max_time_ms,
+                min_time_ms: if x.min_time_ms == u64::MAX {
+                    0
+                } else {
+                    x.min_time_ms
+                },
+            })
+            .collect();
+
+        items.sort_by_key(|x| std::cmp::Reverse(x.total_cpu_ms));
+
+        let total_templates = items.len() as u64;
+        let (total_executions, total_time_ms, total_cpu_ms) =
+            items.iter().fold((0u64, 0u64, 0u64), |acc, it| {
+                (
+                    acc.0 + it.count,
+                    acc.1 + it.total_time_ms,
+                    acc.2 + it.total_cpu_ms,
+                )
+            });
+
+        AnalysisResult {
+            used_fallback: !has_threshold_matches && !items.is_empty(),
+            items,
+            total_templates,
+            total_executions,
+            total_time_ms,
+            total_cpu_ms,
+        }
+    }
+}
diff --git a/src/tools/fe/audit_topsql/mod.rs b/src/tools/fe/audit_topsql/mod.rs
@@ -0,0 +1,7 @@
+mod aggregate;
+mod normalize;
+mod parser;
+mod report;
+mod tool;
+
+pub use tool::FeAuditTopSqlTool;
diff --git a/src/tools/fe/audit_topsql/normalize.rs b/src/tools/fe/audit_topsql/normalize.rs
@@ -0,0 +1,74 @@
+use once_cell::sync::Lazy;
+use regex::Regex;
+use std::collections::HashSet;
+
+static RE_LITERALS: Lazy<Regex> = Lazy::new(|| Regex::new(r"'[^']*'|\b\d+(?:\.\d+)?\b").unwrap());
+static RE_IN_LIST: Lazy<Regex> =
+    Lazy::new(|| Regex::new(r"(?i)\bin\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
+static RE_NOT_IN_LIST: Lazy<Regex> =
+    Lazy::new(|| Regex::new(r"(?i)\bnot\s+in\s*\(\s*\?(?:\s*,\s*\?)*\s*\)").unwrap());
+static RE_MULTI_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s+").unwrap());
+static RE_OPERATOR_SPACE: Lazy<Regex> = Lazy::new(|| Regex::new(r"\s*([=(),])\s*").unwrap());
+static RE_LINE_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?m)--[^\n]*").unwrap());
+static RE_BLOCK_COMMENT: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?s)/\*.*?\*/").unwrap());
+static RE_CTE: Lazy<Regex> =
+    Lazy::new(|| Regex::new(r"(?i)(?:^|\bwith\b|,)\s*([a-z0-9_]+)\s+as\s*\(").unwrap());
+static RE_FROM: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?i)\bfrom\s+([a-z0-9_.`]+)").unwrap());
+
+pub fn normalize_sql(sql: &str) -> String {
+    if sql.is_empty() {
+        return String::new();
+    }
+
+    let mut out = RE_BLOCK_COMMENT.replace_all(sql, " ").to_string();
+    if out.contains('\n') {
+        out = RE_LINE_COMMENT.replace_all(&out, " ").to_string();
+    }
+    out = out.replace(['\r', '\n', '\t'], " ");
+
+    out = RE_LITERALS.replace_all(&out, "?").to_string();
+    out = RE_NOT_IN_LIST.replace_all(&out, "not in (?)").to_string();
+    out = RE_IN_LIST.replace_all(&out, "in (?)").to_string();
+    out = RE_OPERATOR_SPACE.replace_all(&out, "$1").to_string();
+    out.make_ascii_lowercase();
+    out = RE_MULTI_SPACE.replace_all(&out, " ").to_string();
+    out.trim().to_string()
+}
+
+pub fn guess_table(normalized_sql: &str) -> Option<String> {
+    if normalized_sql.is_empty() {
+        return None;
+    }
+
+    let mut ctes: HashSet<String> = HashSet::new();
+    for cap in RE_CTE.captures_iter(normalized_sql) {
+        if let Some(name) = cap.get(1) {
+            ctes.insert(name.as_str().to_ascii_lowercase());
+        }
+    }
+
+    for cap in RE_FROM.captures_iter(normalized_sql) {
+        let Some(m) = cap.get(1) else { continue };
+        let table = m.as_str().replace('`', "");
+        let table_lc = table.to_ascii_lowercase();
+        if ctes.contains(&table_lc) {
+            continue;
+        }
+        if matches!(
+            table_lc.as_str(),
+            "a" | "b"
+                | "c"
+                | "d"
+                | "t"
+                | "t_index"
+                | "params"
+                | "current_data"
+                | "last_period_data"
+        ) {
+            continue;
+        }
+        return Some(table);
+    }
+
+    None
+}