Tag: data-analysis

  • How to Build a Survey Results Report with Power BI

    Power BI can be useful for displaying and storing survey results for your organization. In this example we build some data for the results of a fictitious Airline survey, using questions taken from a real online airport survey.


    The data was built in Excel with a sheet for questions and a sheet for answers. You can download the Excel file here.

    It’s easy to publish the report as a PDF, so you can share it with users who don’t have access to Power BI. Here is the airport survey report as PDF

    The questions are a simple list with a Question ID and section.


    The answers on the second sheet are formatted so that each row represents all the answers for each respondent.



    Steps for Building the Airline Survey Report.

    Step 1
    The Data is imported to our Power BI Desktop survey using Power Query.

    The only change to the question table is a simple column rename.


    The answer table is transformed so that each line of the table contains the respondent ID, the question, and the Value (the answer). This prepares the data so it can be joined to the question table and easily visualized.
    The applied steps are as follows:

    As you can see the first step after the import is simply to promote the headers, so the column names are the actual column names.
    The second step is just to change the column types to the correct types.
    The key third step is to unpivot the data, so the table is transformed as below a separate row for each respondent’s answer. This makes the report easy to build.


    The report only requires one measure, which is just a unique count of the respondent ID

    Answer Count = DISTINCTCOUNT(Answers[Respondent])


    Basic Graphs can be added.

    And Likert Scale style graphs using a 100% Stacked bar graph visual.

    Its a simple, but useful report.



  • Examples of using the Python Seaborn Graphs Library

    This is pretty powerful. You can spend your time creating graphs one at a time or you can create a pairs plot using the Seaborn library. The pairs plot graphs every combination of variables, to create multiple graphs. The scatter graphs are below and the simple code is further down the page. There is a big outlier for the first purchase amount that stands out. We need to fish that one out first and re-run.



    import seaborn as sns
    import matplotlib.pyplot as plt
    #from sklearn.datasets import fetch_california_housing
    import pandas as pd
    
    csvlocation = csvpath = 'outliercheck.csv'
    df = pd.read_csv(csvlocation)
    #df.columns # display the columns available
    
    pairplot = sns.pairplot(df)
    
    # Display the plot
    plt.show()

    If can sort the data frame by the ‘firstpurchasedate’ field we can have a look at the output.

    # Sort the DataFrame by the 'firstpurchasedate' column
    df_sorted = df.sort_values(by='FirstPurchaseAmount', ascending=False)
    
    # Display the first few rows of the sorted DataFrame
    print(df_sorted.head()

    The outlier with CustomerID = 1 is the big outlier. It was possibly a test, which raises another question. Is there test data that needs to be removed? Anyway. Let’s remove it.


    We can remove just this one customer using the following code. We can also remove the Age and Gender fields as they look normal from a previous look and will reduce the number of graphs in the pair plot.

    #remove row where CustomerID = 1
    df_filtered = df[df['CustomerID'] != 1]
    # Remove the 'age', 'gender' and CustomerID columns
    df_filtered = df_filtered.drop(['Age', 'Gender', 'CustomerID'], axis=1)


    The output of re-running the graphs after the major outlier is removed and I’ve removed the Age, CustomerID, and Gender columns from the output as they aren’t of interest.

    Now, we have a clearer view of the data. As expected there are more outliers to remove from the data. We can use the 3x standard deviation method to remove outliers at the top and bottom 2.5% of the data.