Case Study - Challenger Launch¶

Importing the necessary libraries¶

In [ ]:
# Basic libraries of python for numeric and dataframe computations
import pandas as pd
import numpy as np
In [1]:
# Connect to google
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Loading the data¶

In [ ]:
data=pd.read_csv('challenger-data.csv')

Now let us see the top five records of the rating data.

In [ ]:
data.head()
Out[ ]:
Observation Y X
0 1 1 53
1 2 1 53
2 3 1 53
3 4 0 53
4 5 0 53
  • X represent the temperature while the time of launch of the Rocket.
  • Y represents the whether an o-rings failure happened or not at the temperature.

Let's check the info of the data.

In [ ]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Observation  120 non-null    int64
 1   Y            120 non-null    int64
 2   X            120 non-null    int64
dtypes: int64(3)
memory usage: 2.9 KB
  • The data is comprised of 120 non-null values.
In [ ]:
data.describe()
Out[ ]:
Observation Y X
count 120.000000 120.000000 120.000000
mean 60.500000 0.083333 70.000000
std 34.785054 0.277544 7.100716
min 1.000000 0.000000 53.000000
25% 30.750000 0.000000 67.000000
50% 60.500000 0.000000 70.000000
75% 90.250000 0.000000 75.250000
max 120.000000 1.000000 81.000000
  • The average temperature at which launch usually happens is 70 fahrenheit.

Visualizing the Data¶

In [ ]:
# We will be using the Matplotlib library for plotting.

# subsetting the data
failures = data.loc[(data.Y == 1)]
no_failures	= data.loc[(data.Y == 0)]

# frequencies
failures_freq = failures.X.value_counts() #failures.groupby('X')
no_failures_freq = no_failures.X.value_counts()

# plotting
import matplotlib as mpl
from matplotlib	import pyplot as plt
plt.scatter(failures_freq.index, failures_freq, c='red', s=40)
plt.scatter(no_failures_freq.index, np.zeros(len(no_failures_freq)), c='blue', s=40)
plt.xlabel('X: Temperature')
plt.ylabel('Number of Failures')
plt.legend(['failures', 'No failures'])
plt.show()
No description has been provided for this image
  • At higher temperatures there are very less chance for o-rings failures.
  • There is a chance where there is no o-ring failure below 55 temperature and other has 3 o-rings failures which creates a sense of doubt whether to go for it or not.

Logistic Regression¶

In [ ]:
# You will need to have the following libraries installed before proceeding:
import statsmodels.formula.api as SM

# Build the model
model = SM.logit(formula='Y~X',data=data)
result = model.fit()

# Summarize the model
print (result.summary())
Optimization terminated successfully.
         Current function value: 0.242411
         Iterations 7
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      Y   No. Observations:                  120
Model:                          Logit   Df Residuals:                      118
Method:                           MLE   Df Model:                            1
Date:                Tue, 27 Jul 2021   Pseudo R-squ.:                  0.1549
Time:                        20:25:44   Log-Likelihood:                -29.089
converged:                       True   LL-Null:                       -34.420
Covariance Type:            nonrobust   LLR p-value:                  0.001094
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.4049      3.041      2.435      0.015       1.445      13.365
X             -0.1466      0.047     -3.104      0.002      -0.239      -0.054
==============================================================================
  • We now have the model and the summaries should provide the coefficient, intercept, standard errors and p-values.
  • The Negative coefficient for X determines if the the temperate lowers by 1 there is ~15% chance for o-ring failure.
  • p have for both intercept and X signifies that they are statistically significant and temperature does effect the change of an o-ring failure.
In [ ]:
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/My Drive/Colab Notebooks/Copy of FDS_Project_LearnerNotebook_FullCode.ipynb"