Comparison of Full Data Pipelines from Data Ingestion to Data Science

In the table below is another magically created comparison between technologies in full end-to-end pipelines.
I think I actually prefer this view to an overwhelming social media shared diagram plastered with brands’ logos.
The flow highlights the potential stages and optional tools/technologies involved.
For now, it serves as a useful template to view the various pipeline options and for future study.

Technology data flow
Code data flow

Technology Data flow

Stage Path 1 — Microsoft / Fabric Path 2 — Snowflake + dbt (Cloud-agnostic) Path 3 — Google Cloud (GCP)
Sources & Ingestion
Azure Data Factory (ADF)
Fabric Dataflows Gen2
Event Hubs / IoT Hub (stream)
ADF Copy Activity, REST, ODBC/JDBC
Snowpipe (auto-ingest) + Stages
Fivetran / Stitch / Airbyte
Kafka / Kinesis via connectors
AWS Glue jobs (optional)
Cloud Data Fusion (GUI ETL)
Pub/Sub (stream)
Dataflow (Beam) ingestion
Storage Transfer / Transfer Service
Raw Landing / Data Lake
Azure Data Lake Storage Gen2
OneLake (Fabric)
Delta/Parquet zones: /raw /bronze
External Stages on S3/Azure/GCS
Internal Stages (Snowflake-managed)
Raw files (CSV/JSON/Parquet)
Google Cloud Storage (GCS)
Raw buckets (landing)
Formats: Avro/Parquet/JSON
Orchestration
ADF Pipelines & Triggers
Fabric Pipelines
Azure Functions (events)
Azure DevOps/GitHub Actions (runs)
Airflow / Dagster / Prefect
Snowflake Tasks & Streams
dbt Cloud scheduler
CI via GitHub Actions
Cloud Composer (Airflow)
Workflows / Cloud Scheduler
Dataform (dbt-like) scheduling
Transform (ELT / ETL)
Fabric Data Engineering (Spark)
Azure Databricks (Delta)
T-SQL in Fabric Warehouse
Synapse SQL/Spark (legacy)
dbt models (SQL + Jinja)
Snowflake SQL (MERGE/Tasks)
Snowpark (Python/Scala)
Streams for CDC
BigQuery SQL (ELT)
Dataflow (Beam) for heavy lift
Dataproc (Spark) when needed
Dataform/dbt for modeling
Curated / Serving Warehouse
Fabric Warehouse / Lakehouse
Dedicated SQL Pools (Synapse)
Delta tables (silver/gold)
Snowflake (Databases/Schemas)
Time Travel, Cloning
Materialized Views
BigQuery Datasets
Partitioned & clustered tables
Materialized Views
Semantic Layer / Modeling
Power BI Datasets (Tabular)
Calculation Groups (TE)
Row-Level Security (RLS)
Power BI Deployment Pipelines
dbt semantic models & metrics
Headless BI (Cube/Virt.)
RLS via Snowflake roles/policies
DirectQuery/Live connections
Looker (LookML semantic layer)
Looker Explore/Views/Models
BigQuery Authorized Views
Row/column policy tags
BI / Visualization & Analysis
Power BI (Desktop/Service)
Paginated Reports (RDL)
Excel over Power BI
Power BI / Tableau / Looker Studio
Sigma / Mode (optional)
Embedded analytics
Looker (first-class)
Looker Studio (lightweight)
Data Catalog-linked exploration
Data Science / ML
Azure ML (AutoML, MLOps)
Databricks ML + MLflow
SynapseML / ONNX
Snowpark ML / UDFs
External: SageMaker / Databricks
Feature Store via Snowflake/Feast
Vertex AI (AutoML, pipelines)
BigQuery ML (in-SQL models)
Feature Store (Vertex)
Data Quality / Governance
Microsoft Purview (Catalog/Lineage)
Power BI lineage & sensitivity
Great Expectations (optional)
Snowflake RBAC, Tags, Masking
dbt tests, Great Expectations
Monte Carlo/Bigeye (obs.)
Dataplex (governance)
Data Catalog (metadata)
DQ via Dataform tests / GE
DevOps / CI-CD & Infra
Azure DevOps / GitHub Actions
Power BI Deployment Pipelines
IaC: Bicep / Terraform
GitHub Actions + dbt CI
schemachange / SnowChange
IaC: Terraform / Pulumi
Cloud Build / Cloud Deploy
Dataform CI, dbt CI
IaC: Terraform
Monitoring / Cost Control
Azure Monitor / Log Analytics
Fabric Workspace metrics
Cost Mgmt + Budgets
Snowflake Resource Monitors
Query History, Access History
3rd-party cost dashboards
Cloud Monitoring & Logging
BigQuery INFORMATION_SCHEMA
Budgets + Alerts

Code Data Flow

Stage Microsoft / Fabric Snowflake + dbt Google Cloud (GCP)
Ingestion Code
Python ETL (requests, pyodbc)
ADF / Fabric pipeline JSON
Dataflow Gen2 JSON
CREATE PIPE / CREATE STAGE
Airbyte / Fivetran configs (YAML)
COPY OPTIONS
Apache Beam (Py/Java)
Cloud Data Fusion JSON
Pub/Sub schema JSON
Raw Landing Config
ADLS / OneLake folder layout
Parquet / Delta write options
Access policies (JSON)
Stages & File format DDL
CSV / JSON / Parquet
Grants & policies
GCS bucket layout
Lifecycle rules JSON
BQ external table DDL
Orchestration Code
ADF pipeline JSON + triggers
Fabric Pipeline YAML
Azure Functions (Python)
Airflow DAGs (Python)
Prefect flows (Python)
Snowflake TASKS SQL
Cloud Composer DAGs (Python)
Cloud Scheduler jobs
Dataform schedules
Transform / Modeling
Databricks notebooks (Py/Spark)
Delta Live Tables pipelines
T-SQL stored procs
dbt models (*.sql)
dbt Jinja macros (*.sql)
Snowpark (Python) UDFs
BigQuery SQL models (*.sql)
Dataform/dbt *.sqlx + yaml
Dataproc Spark notebooks
CDC / Merge to Curated
MERGE INTO (T-SQL)
PySpark notebook jobs
Delta OPTIMIZE/VACUUM
MERGE INTO curated.* SQL
Streams for CDC
Materialized Views
MERGE INTO USING staging
Partition / Cluster DDL
Stored procedures
Semantic Layer
Tabular model (TMDL)
Calc groups (TE script)
RLS DAX expressions
dbt semantic models (YAML)
metrics.yaml / exposures
Masking policies (SQL)
LookML view/model files
Explores & joins
Policy tags
BI / Report Code
Power BI PBIX / PBIT
Paginated RDL XML
PowerQuery M scripts
Tableau / Power BI
BI SQL views
Sigma workbooks
Looker dashboards (lkml)
Looker Studio reports
BQ UDFs (JS)
Data Science Code
Azure ML notebooks (Python)
MLflow tracking code
ONNX export
Snowpark-ML notebooks (Py)
UDF registration SQL
MLflow registry
Vertex AI notebooks (Python)
BQML CREATE MODEL SQL
Vertex pipelines (YAML)
Tests & Data Quality
Great Expectations suites
Power BI model tests (DAX)
Custom pytest checks
dbt tests (schema.yml)
Great Expectations suites
SQL anomaly checks
Dataform tests (assertions)
Great Expectations in Beam
INFORMATION_SCHEMA queries
CI/CD Config
GitHub Actions YAML
Power BI Deployment Pipelines
Bicep steps
dbt Cloud job YAML
GitHub Actions for dbt
Terraform scripts
Cloud Build YAML
BQ deploy scripts
Terraform modules
Infra as Code
Bicep / Terraform templates
Azure DevOps variable groups
Terraform (Snowflake provider)
SnowChange / schemachange
Terraform (GCS, BQ, VPC)
IAM/Secrets configs