Author: CB

DAX – Creating Multiple Measure With DAX Query Editor
DAX – Creating Multiple Measure With DAX Query Editor
Creating multiple measures in DAX Query Editor It may be useful to create templates for creating DAX models, using commonly used DAX measures. Here we create a template for commonly used sales measures, such as Sales YTD, Sales Last Year, Sales Year on Year, and Sales Year on Year %, we can then apply the…

DAX – Using the Index Function
DAX – Using the Index Function
Demonstrating the use of the Index function with the Contoso Retail Data warehouse. We can start off by building a virtual table in the DAX query editor, which we can use to apply the Index function. The table creates is a list of the first 10 customers by customer key from the DimCustomer table. If…

Building a Measure with the DAX RankX Function
Building a Measure with the DAX RankX Function
Using the RANK() Function in DAX To demonstrate the use of the DAX function, we’ll start by creating a simple sales table with the sales amount and the year. Then we’ll add additional columns to create examples of using the RANK function. In the first example (SimpleSalesRank), we’ll just create a simple ranking. The default…

SQL Sales Report
SQL Sales Report
We can create a more interesting sales report using Window functions and additional calculations in the query, including the gross margin %, total profit, and ranks for product price, sales quantity, and sales rank. The query uses a Common Table Expression (CTE) function to group the data first (as we can’t group using Windows functions).…

Simple Linear Regression
Simple Linear Regression
Creating a simple Linear regression model and preparing for multilinear regression. In this example, we use a sample of marketing spend data vs. sales and inspect the correlation between radio spend and total sales. The regression line is fitted using the ols function from statsmodels.formula.api Results return from the model.summary() method from the OLS (ordinary…

Graphing Assumptions
Graphing Assumptions
Checking for assumptions (3 in this case), with subplotted seaborn graphics.The code for creating the linear regression model can be found in this post Linearity Assumption‘Each predictor variable (x) is linearly related to the outcome of variable y.’ Checking Homeoscedacity assumption with a scatterplot y_pred are the predicted y values from a regression line.The residuals…

Classification Models: Data Process
Classification Models: Data Process
Introduction: Classification models are machine learning models that are used to predict binary outcome scenarios such as: Spam / Not SpamFraudulent Transaction / NonFraudulant Transationcustomer churn/ customer will not churncuster high value / customer low valueload approval /nonapproval The Data Process Planning Exploratory Data Analysis (Python) Model Selection (Classification) Classification Models: Naive Bayes, Logistic Regression,…

KMeans
KMeans
Supermarket Example Import libraries and read in CSV file to data frameThe data comes from kaggle here (mall data) First, we run some basic checks on the data to check for data integrity.Data includes 200 rows of data by 5 columns We can change Gender to numeric with the following Then we check for nulls…

Statistical Testing
Statistical Testing
Significance Tests (normal distribution assumed) Groups Requirements Test Tails Statistic Distribution Degrees of freedom Example 1 normal dist, n > 30 and known variance ztest 1 and 2 zscore Normal NA A sample of the population is tested for height. Do they match the known population?Population mean and standard deviation are known, sample > 30…

Pandas Notes
Pandas Notes
Notes from Getting Started Tutorial:https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html 1. Creating a Dataframe and Series 2. Create Plots with Matplotlib 3. Add new columns from existing columns 4. Summary Statistics 5. Reshape layout of tables 6. Subsets and Filtering import pandas as pd#read csv file to pandas DatafFrametitanic = pd.read_csv(“titanic.csv”)air_quality = pd.read_csv(“air_quality_no2.csv”, index_col=0, parse_dates=True) #save as excel, index=False removes…