Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
76ad544
feat(bigquery): Adding BigQuery to the connections pages
rboni-dk Aug 6, 2025
b9ef2b4
feat(bigquery): Support BigQuery for Profiling
rboni-dk Aug 15, 2025
74a9569
feat(bigquery): Support BigQuery for Test Generation
rboni-dk Aug 19, 2025
8a171f2
TG-920
diogodk Sep 15, 2025
3bb17e0
TG-920
diogodk Sep 19, 2025
fa30b23
Address Review
diogodk Sep 22, 2025
79bafc2
Re-review
diogodk Sep 22, 2025
446c4ac
re-re-review
diogodk Sep 22, 2025
c70ec9e
remove obs url
diogodk Sep 22, 2025
0b7ae9d
Merge branch 'TG-920' into 'enterprise'
Sep 22, 2025
5e3aa67
Merge branch 'main' into 'enterprise'
aarthy-dk Sep 22, 2025
c60a840
fix(sorting selector): selected items not sorted correctly
aarthy-dk Sep 16, 2025
35ccbef
fix(sql): quote snowflake identifiers correctly
aarthy-dk Sep 17, 2025
cc50412
fix(tests): don't quote columns in Timeframe Combo Match
aarthy-dk Sep 17, 2025
4d1e261
fix(redshift): profiling error for timestamps with time zone
aarthy-dk Sep 18, 2025
32517ab
fix(postgres): profiling bugs on money and time data types
aarthy-dk Sep 18, 2025
77d7a66
Merge branch 'sort-fix' into 'enterprise'
Sep 22, 2025
3c7f5ea
fix: Applying sampling to the secondary profiling query
rboni-dk Sep 23, 2025
64a4488
Merge branch 'tg-947-2nd-sampl' into 'enterprise'
Sep 23, 2025
6fcca5e
feat: add support for redshift spectrum
aarthy-dk Sep 4, 2025
7ee6f51
Merge branch 'redshift-spectrum' into 'enterprise'
Sep 23, 2025
bc561cd
Merge branch 'enterprise' of gitlab.com:dkinternal/testgen/dataops-te…
rboni-dk Sep 23, 2025
ffb6376
Prof fix irregular table names, X datatypes
cbloche Sep 26, 2025
352ab1c
Identify TEXT, NTEXT as general_type X
cbloche Sep 26, 2025
be7a257
Merge branch 'chip/profile_fix_20250926' into 'enterprise'
Sep 26, 2025
53bfb66
feat(pagination): paginate grid component
aarthy-dk Sep 24, 2025
c83787d
refactor(run-pages): move filters and pagination to vanjs
aarthy-dk Sep 24, 2025
2f86195
fix(pagination): handle invalid page in query
aarthy-dk Sep 26, 2025
5d0f0c7
Merge branch 'grid-pagination' into 'enterprise'
Sep 29, 2025
730c1b6
misc: Addressing code review feedback
rboni-dk Sep 29, 2025
7dba164
Merge remote-tracking branch 'origin/enterprise' into bigquery-2
rboni-dk Sep 29, 2025
48d0fd3
Merge branch 'bigquery-2' into 'enterprise'
Sep 29, 2025
56ec66d
fix: Updating services images to point to bitnamis legacy repo
rboni-dk Oct 1, 2025
6ef553a
Merge branch 'fix-bitnami' into 'enterprise'
Oct 1, 2025
2c35f44
fix(source-data): fix lookup queries for Valid_US_Zip3 test
aarthy-dk Sep 28, 2025
1f58efb
feat(project-dashboard): display data point counts
aarthy-dk Sep 28, 2025
919eb3e
fix(source-data): set max height for sql query
aarthy-dk Sep 28, 2025
ca67e56
fix(table freshness): arithmetic overflow in mssql
aarthy-dk Sep 29, 2025
b67008d
fix(test-results): add details text to bottom section
aarthy-dk Sep 30, 2025
4b57cd7
feat(schedules): support pausing/resuming job schedules
aarthy-dk Sep 30, 2025
d060b0e
feat(hygiene-issues): add summary counts component
aarthy-dk Oct 1, 2025
e7452bf
Merge branch 'misc-fixes' into 'enterprise'
Oct 1, 2025
fc71112
fix(project-dashboard): improve css responsiveness
aarthy-dk Oct 1, 2025
8cd4849
fix(connection-form): validation on file fields, clear db fields
aarthy-dk Oct 1, 2025
5bf641c
Merge branch 'connection-form-fixes' into 'enterprise'
Oct 2, 2025
f07de29
feat(monitor): add data structure log
diogodk Oct 1, 2025
b092836
Merge branch 'TG-940' into 'enterprise'
Oct 3, 2025
a38046d
feat(data-type): track database data type for columns
aarthy-dk Oct 3, 2025
1f6ad69
Merge branch 'db-data-type' into 'enterprise'
Oct 3, 2025
2fe741a
fix(test-validation): detect missing columns in Dupe_Rows test
aarthy-dk Oct 7, 2025
62252a9
fix(logs): suppress noisy logs during setup and upgrade
aarthy-dk Oct 7, 2025
a1500cd
fix(mssql): arithmetic overflow error in Incr_Avg_Shift test
aarthy-dk Oct 7, 2025
5fdeeda
Merge branch 'aarthy/test-validation' into 'enterprise'
Oct 8, 2025
5edee1e
fix: add QUOTE to test and lookup queries to support non-standard ide…
cbloche Oct 8, 2025
d4076a1
fix(hygiene-issues): numeric value out of range error
aarthy-dk Oct 9, 2025
b5af463
fix(test-validation): handle commas in column names
aarthy-dk Oct 9, 2025
360d473
Merge branch 'chip/weird_table_name_handler' into 'enterprise'
Oct 9, 2025
af22198
fix: miscellaneous qa fixes
aarthy-dk Oct 9, 2025
151598a
Merge branch 'aarthy/qa-fixes' into 'enterprise'
Oct 10, 2025
007a17a
fix(sampling): trim quotes in estimate query
aarthy-dk Oct 10, 2025
8fd8cc8
fix(profiling-results): make filters consistent with hygiene issues
aarthy-dk Oct 10, 2025
5d0f4d0
fix(grid): remove unpaginated filter and sort
aarthy-dk Oct 10, 2025
1f540de
fix(connection): unnecessary rerender of file input
aarthy-dk Oct 10, 2025
e8145cb
Merge branch 'aarthy/qa-fixes' into 'enterprise'
Oct 10, 2025
7b4d688
fix(hygiene-issues): include potential pii in possible count
aarthy-dk Oct 10, 2025
d2e2246
fix: clear cache on page refresh
aarthy-dk Oct 10, 2025
6016ff4
Merge branch 'aarthy/qa-fixes' into 'enterprise'
Oct 10, 2025
a630c1b
release: 4.26.1 -> 4.32.5
aarthy-dk Oct 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deploy/charts/testgen-services/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
version: 0.1.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
6 changes: 6 additions & 0 deletions deploy/charts/testgen-services/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@ postgresql:
fullnameOverride: postgresql
auth:
database: "datakitchen"
image:
repository: bitnamilegacy/postgresql

global:
security:
allowInsecureImages: true
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "dataops-testgen"
version = "4.26.1"
version = "4.32.5"
description = "DataKitchen's Data Quality DataOps TestGen"
authors = [
{ "name" = "DataKitchen, Inc.", "email" = "info@datakitchen.io" },
Expand All @@ -33,6 +33,7 @@ dependencies = [
"sqlalchemy==1.4.46",
"databricks-sql-connector==2.9.3",
"snowflake-sqlalchemy==1.6.1",
"sqlalchemy-bigquery==1.14.1",
"pyodbc==5.0.0",
"psycopg2-binary==2.9.9",
"pycryptodome==3.21",
Expand Down
20 changes: 20 additions & 0 deletions testgen/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
from testgen.commands.run_observability_exporter import run_observability_exporter
from testgen.commands.run_profiling_bridge import run_profiling_queries
from testgen.commands.run_quick_start import run_quick_start, run_quick_start_increment
from testgen.commands.run_test_metadata_exporter import run_test_metadata_exporter
from testgen.commands.run_upgrade_db_config import get_schema_revision, is_db_revision_up_to_date, run_upgrade_db_config
from testgen.common import (
configure_logging,
Expand Down Expand Up @@ -503,6 +504,25 @@ def export_data(configuration: Configuration, project_key: str, test_suite_key:
click.echo("\nexport-observability completed successfully.\n")


@click.option(
"--path",
help="Path to the templates folder. Defaults to path from project root.",
required=False,
default="testgen/template",
)
@cli.command("export-test-metadata", help="Exports current test metadata records to yaml files.")
@pass_configuration
def export_test_metadata(configuration: Configuration, path: str):
click.echo("export-test-metadata")
LOG.info("CurrentStep: Main Program - Test Metadata Export")
if not os.path.isdir(path):
LOG.error(f"Provided path {path} is not a directory. Please correct the --path option.")
return
run_test_metadata_exporter(path)
LOG.info("CurrentStep: Main Program - Test Metadata Export - DONE")
click.echo("\nexport-test-metadata completed successfully.\n")


@cli.command("list-test-types", help="Lists all available TestGen test types.")
@click.option("-d", "--display", help="Show command output in the terminal.", is_flag=True, default=False)
@pass_configuration
Expand Down
13 changes: 6 additions & 7 deletions testgen/commands/queries/execute_cat_tests_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ class CATTestParams(TypedDict):
class CCATExecutionSQL:
project_code = ""
flavor = ""
concat_operator = ""
test_suite = ""
run_date = ""
test_run_id = ""
Expand All @@ -35,8 +34,7 @@ def __init__(self, strProjectCode, strTestSuiteId, strTestSuite, strSQLFlavor, m
self.test_suite_id = strTestSuiteId
self.test_suite = strTestSuite
self.project_code = strProjectCode
flavor_service = get_flavor_service(strSQLFlavor)
self.concat_operator = flavor_service.get_concat_operator()
self.flavor_service = get_flavor_service(strSQLFlavor)
self.flavor = strSQLFlavor
self.max_query_chars = max_query_chars
self.today = date_service.get_now_as_string_with_offset(minutes_offset)
Expand All @@ -47,7 +45,7 @@ def _get_rollup_scores_sql(self) -> CRollupScoresSQL:
self._rollup_scores_sql = CRollupScoresSQL(self.test_run_id, self.table_groups_id)

return self._rollup_scores_sql

def _get_query(self, template_file_name: str, sub_directory: str | None = "exec_cat_tests", no_bind: bool = False) -> tuple[str, dict | None]:
query = read_template_sql_file(template_file_name, sub_directory)
params = {
Expand All @@ -58,8 +56,9 @@ def _get_query(self, template_file_name: str, sub_directory: str | None = "exec_
"TEST_SUITE_ID": self.test_suite_id,
"TABLE_GROUPS_ID": self.table_groups_id,
"SQL_FLAVOR": self.flavor,
"ID_SEPARATOR": "`" if self.flavor == "databricks" else '"',
"CONCAT_OPERATOR": self.concat_operator,
"QUOTE": self.flavor_service.quote_character,
"VARCHAR_TYPE": self.flavor_service.varchar_type,
"CONCAT_OPERATOR": self.flavor_service.concat_operator,
"SCHEMA_NAME": self.target_schema,
"TABLE_NAME": self.target_table,
"NOW_DATE": "GETDATE()",
Expand All @@ -73,7 +72,7 @@ def _get_query(self, template_file_name: str, sub_directory: str | None = "exec_
query = replace_params(query, params)
query = replace_templated_functions(query, self.flavor)

if no_bind and self.flavor != "databricks":
if no_bind:
# Adding escape character where ':' is referenced
query = query.replace(":", "\\:")

Expand Down
17 changes: 11 additions & 6 deletions testgen/commands/queries/execute_tests_query.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from typing import ClassVar, TypedDict

from testgen.common import AddQuotesToIdentifierCSV, CleanSQL, ConcatColumnList, date_service, read_template_sql_file
from testgen.common.database.database_service import replace_params
from testgen.common import date_service, read_template_sql_file
from testgen.common.clean_sql import CleanSQL, ConcatColumnList, quote_identifiers
from testgen.common.database.database_service import get_flavor_service, replace_params


class TestParams(TypedDict):
Expand Down Expand Up @@ -54,6 +55,7 @@ class CTestExecutionSQL:
def __init__(self, strProjectCode, strFlavor, strTestSuiteId, strTestSuite, minutes_offset=0):
self.project_code = strProjectCode
self.flavor = strFlavor
self.flavor_service = get_flavor_service(strFlavor)
self.test_suite_id = strTestSuiteId
self.test_suite = strTestSuite
self.today = date_service.get_now_as_string_with_offset(minutes_offset)
Expand Down Expand Up @@ -100,20 +102,21 @@ def _get_query(
"TEST_SUITE_ID": self.test_suite_id,
"TEST_SUITE": self.test_suite,
"SQL_FLAVOR": self.flavor,
"QUOTE": self.flavor_service.quote_character,
"TEST_RUN_ID": self.test_run_id,
"INPUT_PARAMETERS": self._get_input_parameters(),
"RUN_DATE": self.run_date,
"EXCEPTION_MESSAGE": self.exception_message,
"START_TIME": self.today,
"PROCESS_ID": self.process_id,
"VARCHAR_TYPE": "STRING" if self.flavor == "databricks" else "VARCHAR",
"VARCHAR_TYPE": self.flavor_service.varchar_type,
"NOW_TIMESTAMP": date_service.get_now_as_string_with_offset(self.minutes_offset),
**{key.upper(): value or "" for key, value in self.test_params.items()},
}

if self.test_params:
column_name = self.test_params["column_name"]
params["COLUMN_NAME"] = AddQuotesToIdentifierCSV(column_name) if column_name else ""
params["COLUMN_NAME"] = quote_identifiers(column_name, self.flavor) if column_name else ""
# Shows contents without double-quotes for display and aggregate expressions
params["COLUMN_NAME_NO_QUOTES"] = column_name or ""
# Concatenates column list into single expression for relative entropy
Expand All @@ -126,11 +129,13 @@ def _get_query(
)

subset_condition = self.test_params["subset_condition"]
params["SUBSET_DISPLAY"] = subset_condition.replace("'", "''") if subset_condition else ""
params["SUBSET_DISPLAY"] = subset_condition.replace(
"'", self.flavor_service.escaped_single_quote
) if subset_condition else ""

query = replace_params(query, params)

if no_bind and self.flavor != "databricks":
if no_bind:
# Adding escape character where ':' is referenced
query = query.replace(":", "\\:")

Expand Down
9 changes: 6 additions & 3 deletions testgen/commands/queries/generate_tests_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from typing import ClassVar, TypedDict

from testgen.common import CleanSQL, date_service, read_template_sql_file
from testgen.common.database.database_service import replace_params
from testgen.common.database.database_service import get_flavor_service, replace_params
from testgen.common.read_file import get_template_files

LOG = logging.getLogger("testgen")
Expand All @@ -29,7 +29,10 @@ class CDeriveTestsSQL:

_use_clean = False

def __init__(self):
def __init__(self, flavor):
self.sql_flavor = flavor
self.flavor_service = get_flavor_service(flavor)

today = date_service.get_now_as_string()
self.run_date = today
self.as_of_date = today
Expand All @@ -47,7 +50,7 @@ def _get_params(self) -> dict:
"GENERATION_SET": self.generation_set,
"AS_OF_DATE": self.as_of_date,
"DATA_SCHEMA": self.data_schema,
"ID_SEPARATOR": "`" if self.sql_flavor == "databricks" else '"',
"QUOTE": self.flavor_service.quote_character,
}

def _get_query(self, template_file_name: str, sub_directory: str | None = "generation") -> tuple[str, dict]:
Expand Down
41 changes: 39 additions & 2 deletions testgen/commands/queries/profiling_query.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import re
import typing

from testgen.commands.queries.refresh_data_chars_query import CRefreshDataCharsSQL
from testgen.commands.queries.rollup_scores_query import CRollupScoresSQL
from testgen.common import date_service, read_template_sql_file, read_template_yaml_file
from testgen.common.database.database_service import replace_params
from testgen.common.database.database_service import get_flavor_service, replace_params
from testgen.common.read_file import replace_templated_functions


Expand All @@ -21,6 +22,7 @@ class CProfilingSQL:
col_name = ""
col_gen_type = ""
col_type = ""
db_data_type = ""
col_ordinal_position = "0"
col_is_decimal = ""
col_top_freq_update = ""
Expand Down Expand Up @@ -98,6 +100,7 @@ def _get_params(self) -> dict:
"COL_NAME_SANITIZED": self.col_name.replace("'", "''"),
"COL_GEN_TYPE": self.col_gen_type,
"COL_TYPE": self.col_type or "",
"DB_DATA_TYPE": self.db_data_type or "",
"COL_POS": self.col_ordinal_position,
"TOP_FREQ": self.col_top_freq_update,
"PROFILE_RUN_ID": self.profile_run_id,
Expand All @@ -118,6 +121,7 @@ def _get_params(self) -> dict:
"CONTINGENCY_MAX_VALUES": self.contingency_max_values,
"PROCESS_ID": self.process_id,
"SQL_FLAVOR": self.flavor,
"QUOTE": get_flavor_service(self.flavor).quote_character
}

def _get_query(
Expand All @@ -130,6 +134,7 @@ def _get_query(
params = {}

if query:
query = self._process_conditionals(query)
if extra_params:
params.update(extra_params)
params.update(self._get_params())
Expand All @@ -139,6 +144,33 @@ def _get_query(

return query, params

def _process_conditionals(self, query: str):
re_pattern = re.compile(r"^--\s+TG-(IF|ELSE|ENDIF)(?:\s+(\w+))?\s*$")
condition = None
updated_query = []
for line in query.splitlines(True):
if re_match := re_pattern.match(line):
match re_match.group(1):
case "IF" if condition is None and re_match.group(2) is not None:
condition = bool(getattr(self, re_match.group(2)))
case "ELSE" if condition is not None:
condition = not condition
case "ENDIF" if condition is not None:
condition = None
case _:
raise ValueError("Template conditional misused")
elif condition is not False:
updated_query.append(line)

if condition is not None:
raise ValueError("Template conditional misused")

return "".join(updated_query)

@property
def do_sample_bool(self):
return self.parm_do_sample == "Y"

def GetSecondProfilingColumnsQuery(self) -> tuple[str, dict]:
# Runs on App database
return self._get_query("secondary_profiling_columns.sql")
Expand Down Expand Up @@ -260,7 +292,12 @@ def GetProfilingQuery(self) -> tuple[str, dict]:
else:
strQ += dctSnippetTemplate["strTemplate01_else"]

strQ += dctSnippetTemplate["strTemplate02_all"]
strQ += dctSnippetTemplate["strTemplate01_5"]

if self.col_gen_type == "X":
strQ += dctSnippetTemplate["strTemplate02_X"]
else:
strQ += dctSnippetTemplate["strTemplate02_else"]

if self.col_gen_type in ["A", "D", "N"]:
strQ += dctSnippetTemplate["strTemplate03_ADN"]
Expand Down
60 changes: 29 additions & 31 deletions testgen/commands/queries/refresh_data_chars_query.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from testgen.common import read_template_sql_file
from testgen.common.database.database_service import replace_params
from testgen.common.database.database_service import get_flavor_service, replace_params
from testgen.common.database.flavor.flavor_service import SQLFlavor
from testgen.utils import chunk_queries

Expand Down Expand Up @@ -44,43 +44,41 @@ def _get_query(self, template_file_name: str, sub_directory: str | None = "data_
query = replace_params(query, params)
return query, params

def _get_mask_query(self, mask: str, is_include: bool) -> str:
escape = ""
if self.sql_flavor.startswith("mssql"):
escaped_underscore = "[_]"
elif self.sql_flavor == "snowflake":
escaped_underscore = "\\\\_"
escape = "ESCAPE '\\\\'"
elif self.sql_flavor == "redshift":
escaped_underscore = "\\\\_"
else:
escaped_underscore = "\\_"

table_names = [ item.strip().replace("_", escaped_underscore) for item in mask.split(",") ]
sub_query = f"""
AND {"NOT" if not is_include else ""} (
{" OR ".join([ f"(c.table_name LIKE '{item}' {escape})" for item in table_names ])}
)
"""

return sub_query

def GetDDFQuery(self) -> tuple[str, dict]:
# Runs on Target database
query, params = self._get_query(f"schema_ddf_query_{self.sql_flavor}.sql", f"flavors/{self.sql_flavor}/data_chars")

def _get_table_criteria(self) -> str:
table_criteria = ""
flavor_service = get_flavor_service(self.sql_flavor)

if self.profiling_table_set:
table_criteria += f" AND c.table_name IN ({self.profiling_table_set})"
table_criteria += f" AND c.{flavor_service.ddf_table_ref} IN ({self.profiling_table_set})"

if self.profiling_include_mask:
table_criteria += self._get_mask_query(self.profiling_include_mask, is_include=True)
include_table_names = [
item.strip().replace("_", flavor_service.escaped_underscore)
for item in self.profiling_include_mask.split(",")
]
table_criteria += f"""
AND (
{" OR ".join([ f"(c.{flavor_service.ddf_table_ref} LIKE '{item}' {flavor_service.escape_clause})" for item in include_table_names ])}
)
"""

if self.profiling_exclude_mask:
table_criteria += self._get_mask_query(self.profiling_exclude_mask, is_include=False)

query = query.replace("{TABLE_CRITERIA}", table_criteria)
exclude_table_names = [
item.strip().replace("_", flavor_service.escaped_underscore)
for item in self.profiling_exclude_mask.split(",")
]
table_criteria += f"""
AND NOT (
{" OR ".join([ f"(c.{flavor_service.ddf_table_ref} LIKE '{item}' {flavor_service.escape_clause})" for item in exclude_table_names ])}
)
"""

return table_criteria

def GetDDFQuery(self) -> tuple[str, dict]:
# Runs on Target database
query, params = self._get_query(f"schema_ddf_query_{self.sql_flavor}.sql", f"flavors/{self.sql_flavor}/data_chars")
query = query.replace("{TABLE_CRITERIA}", self._get_table_criteria())
return query, params

def GetRecordCountQueries(self, schema_tables: list[str]) -> list[tuple[str, None]]:
Expand Down
Loading