Conducting Normality Test Using IBM SPSS

1. Criteria for normal distribution

The IBM SPSS software was used to generate the graphical displays and quantitative outputs of the data processing done for a set of property prices. Subsequent analyses were performed to assess the centrality and dispersion of the data within each of the respective variables. The analysis approach referenced four different criteria tests for cross-validation before a most approximate estimation was made on whether that data in each variable fulfilled the compliance of a normal distribution. These criteria were namely (but not in sequence): (i) Histogram, (ii) Q-Q Plot, (iii) Skewness Value (iv) Mean-Median.

2. Case Processing Summary

A Case Processing Summary was generated to identify any missing values. In the 120 cases of the Price variable, there was no missing value as seen in Table 1.1.

Table 1.1 Case Processing Summary

3. Mean, Median and Skewness with Descriptive Statistics

Table 1.2 Descriptive statistics of the Price variable

As observed in Table 1.2, the Mean Price was about 886.57 while the Min and Max value were 192.00 and 1761.00 respectively. This suggested that there might be a higher frequency of data at the lower bound of the data distribution. The Standard Deviation of 324.95 indicated that about 68% of the Price fell between approximately 561.63 and 1211.52. 

After removing the highest and lowest 2.5% of the data, the 5% Trimmed Mean was about 876.7778. The difference between 5% Trimmed Mean and the original Mean accounted for about 1% difference at approximately 10.2. This suggested that any outlier outside the lower and upper bound of 95% Confidence Interval for Mean would unlikely have any noticeable influence on the centrality, dispersion, and overall normality of the data distribution.  

Given the Median value was 852.00, which was about 24.77 and 34.57 lower than the 5% Trimmed Mean and Mean respectively, this difference suggested that the data might be slightly positively skewed. This apparent slight skewness might not be significant since the skewness value was about 0.426, well below 0.8.

At this point, the Descriptive Statistics seemed to suggest that the Price variable might be presumed to be an approximately normal distribution. However, more analyses would be required to further validate this presumption and assess its overall centrality and dispersion.

4. Histogram

Figure 1.1 Histogram of the Price variable

The Price histogram above shows that the data appears to be distributed normally, with the tails skewing neither to the upper nor lower bound of the data; no outlier was observed. From the visualization, the slight skewness recorded in the Descriptive Statistic did not appear to have a noticeable display on the centrality and dispersion of the data in this histogram.

5. Q-Q Plot

Figure 1.2 Normal Q-Q Plot of Price

Based on the graphical display seen in the above Q-Q Plot of Price, it appeared that while the majority of the data points were close to the reasonably straight line presumed to be a representation of an approximately normal distribution, a significant amount of data points at both ends were relatively away from the expected value indicated by the straight line. This analysis from the Q-Q plot would have to be cross-validated with other criteria tests for normal distribution.

6. Kolmogrov-Samirnov and Shapiro Wilk tests

Table 1.3 Tests of Normality

According to the test results in Table 1.3 which were not significant, it strongly suggested that the data was very likely to be of an approximately normal distribution. It further validated and agreed with the Mean-Median, Histogram, Skewness interpretation in the earlier analyses.   

7. Boxplot

Figure 1.3 Boxplot for Price

The Boxplot above further validated the presumption that the data is of an approximately normal distribution as the Median line lied visually in the middle of the box that represented 50% of the Price data. Based on the graph, only one outlier was recorded, and therefore, it was unlikely the source of influence if there was any skewness in question. 

8. Summarised interpretation of Price variable

As observed in Table 1.2, the Mean Price was about 886.57 with a Standard Error of about 29.66 while the Min and Max value were 192.00 and 1761.00 respectively.

After removing the highest and lowest 2.5% of the data, the 5% Trimmed Mean was about 876.7778 with the Lower Bound and Upper Bound at approximately 827.84 and 945.31 respectively. The difference between 5% Trimmed Mean and the Mean was about 1% at approximately 10.2. This meant that any outlier outside the lower and upper bound of 95% Confidence Interval for Mean would unlikely have any noticeable influence on the centrality, dispersion, and overall normality of the data distribution.  

Given the Median value was 852.00, which was about 24.77 and 34.57 lower than the 5% Trimmed Mean and Mean respectively, this difference suggested that the data might be slightly positively skewed. This apparent slight skewness might not be significant as the skewness value is about 0.426, well below 0.8. 

The Standard Deviation indicated that about 68% of the Price fell between approximately 561.63 and 1211.52.

Based on the data trend seen in the above Q-Q Plot of Price, it appeared that while the majority of the data points were close to the straight line presumed to be a representation of a normal distribution, a significant amount of data points at both ends were relatively farther from the expected values indicated by the straight line.

The results of Kolmogrov-Samirnov and Shapiro Wilk tests not significant, which strongly suggested that the data was very likely to be of an approximately normal distribution. This further validated and agreed with the Mean-Median, Histogram, Skewness, and Boxplot interpretation.

Therefore, this study concluded that the data is of an approximately normal distribution.