[This work is based on this course: Data Science for Business | 6 Real-world Case Studies.]
We have to automate the process of flaw detection in the manufacture of steel. Detection of flaws will help improve the steel quality, as well as reduce waste due to flaws production.
The company has been provided us 12,600 images of steel surfaces. Each image contains 4 different types of flaws, where we can also see their location in the images.
1- Import libraries and dataset
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import zipfile import cv2 from skimage import io import tensorflow as tf from tensorflow.python.keras import Sequential from tensorflow.keras import layers, optimizers from tensorflow.keras.applications import DenseNet121 from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.layers import * from tensorflow.keras.models import Model, load_model from tensorflow.keras.initializers import glorot_uniform from tensorflow.keras.utils import plot_model from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, LearningRateScheduler from IPython.display import display from tensorflow.keras import backend as K from sklearn.preprocessing import StandardScaler, normalize import os
– Loading data with manufacturing flaws:
defect_df = pd.read_csv('train.csv') defect_df
[table id=51 /]
– We load the data with and without defects:
all_df = pd.read_csv('defect_and_no_defect.csv') all_df
[table id=52 /]
2 – Data Visualization
– Let’s create a new column for the mask:
defect_df['mask'] = defect_df['ClassId'].map(lambda x: 1) defect_df.head(50)
[table id=53 /]
plt.figure(figsize=(10,10)) sns.countplot(defect_df['ClassId']) plt.ylabel('Number of images per defect') plt.xlabel('ClassID') plt.title('Number of images per class')

Type 3 defect is the most common.
– Some images are classified with more than one flaw, let’s explore this point in detail:
defect_type = defect_df.groupby(['ImageId'])['mask'].sum() defect_type
ImageId 0002cc93b.jpg 1 0007a71bf.jpg 1 000a4bcdd.jpg 1 000f6bf48.jpg 1 0014fce06.jpg 1 .. ffcf72ecf.jpg 1 fff02e9c5.jpg 1 fffe98443.jpg 1 ffff4eaa8.jpg 1 ffffd67df.jpg 1 Name: mask, Length: 5474, dtype: int64
defect_type.value_counts()
1 5201 2 272 3 1 Name: mask, dtype: int64
- We have an image with 3 types of flaws.
- 272 images with 2 types of flaws.
- 5201 images with 1 type of flaws.
plt.figure(figsize=(10,10)) sns.barplot(x = defect_type.value_counts().index, y = defect_type.value_counts() ) plt.xlabel('ClassID') plt.title('Number of defects in image')

defect_df.shape
(5748, 4)
all_df.shape
(12997, 2)
– Number of images with and whitout flaws:
all_df.label.value_counts()
1 7095 0 5902 Name: label, dtype: int64
plt.figure(figsize=(10,10)) sns.barplot(x = all_df.label.value_counts().index, y = all_df.label.value_counts() ) plt.ylabel('Number of images ') plt.xlabel('0 - Non-defect 1- Defect') plt.title('Defect and non-defect images')

– Let’s load and visualize the images together with their defect type labels:
train_dir = 'train_images' for i in range(10): img = io.imread(os.path.join(train_dir, defect_df.ImageId[i])) plt.figure() plt.title(defect_df.ClassId[i]) plt.imshow(img)
3 – Masks
-
First we’re going to import Utilities. This file contains the code for rle2mask, mask2rle, custom loss function and custom data generator, respectively.
-
Since the data provided for the segmentation is in RLE (Run Length Encoded) format, we’ll use the following function to convert the RLE to a mask. We can convert the mask back to RLE to evaluate the accuracy of the model.
Source code of these functions: https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode
defect_df
[table id=54 /]
Test image
– Let’s try using rle2mask in a test image (we go from encoding to mask format):
from utilities import rle2mask , mask2rle image_index = 20 #20 30 mask = rle2mask(defect_df.EncodedPixels[image_index], img.shape[0], img.shape[1]) # [0] of 256 rows and [1] of 1600 columns. #The mask will give us a reordered mask. We load a huge strip with 0s and 1s encoded, the 'rle2mask' will place a row with 0s and 1s first, and secondly it will build a two-dimensional row mask.shape
(256, 1600)
– Let’s see the mask:
plt.imshow(mask)

img = io.imread(os.path.join(train_dir, defect_df.ImageId[image_index])) plt.imshow(img) plt.show() img.shape

(256, 1600, 3)
Real images
– We mark the defect with the green channel to 255:
for i in range(10): # Read the images using opencv and converting to rgb format img = io.imread(os.path.join(train_dir, defect_df.ImageId[i])) # read the image with cv2 and convert it to color channel img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # load the mask from rle mask = rle2mask(defect_df.EncodedPixels[i], img.shape[0], img.shape[1]) # We draw the pixel color with value = 1 (defect) to the color 255 (the maximum possible) for channel 1 (green) img[mask == 1,1] = 255 plt.figure() plt.imshow(img) plt.title(defect_df.ClassId[i])
4 – Building and training a deep learning model
all_df
[table id=55 /]
– We split the dataset into 15% for testing and 85% for training:
from sklearn.model_selection import train_test_split train, test = train_test_split(all_df, test_size=0.15) train.shape
(11047, 2)
test.shape
(1950, 2)
train_dir = 'train_images'
– We make an image generator for the dataset for both training and validation:
# Training = 9390 # validation = 1657 # testing = 1950 from keras_preprocessing.image import ImageDataGenerator # scale data from 0 to 1 and make a validation division of 0,15 datagen = ImageDataGenerator(rescale=1./255., validation_split = 0.15) train_generator = datagen.flow_from_dataframe( dataframe = train, directory = train_dir, x_col = "ImageID", y_col = "label", subset = "training", batch_size = 16, shuffle = True, class_mode = "other", target_size = (256, 256)) valid_generator = datagen.flow_from_dataframe( dataframe = train, directory = train_dir, x_col = "ImageID", y_col = "label", subset = "validation", batch_size = 16, shuffle = True, class_mode = "other", target_size = (256, 256))
Found 9390 validated image filenames. Found 1657 validated image filenames.
test_datagen = ImageDataGenerator(rescale=1./255.) test_generator = test_datagen.flow_from_dataframe( dataframe = test, directory = train_dir, x_col = "ImageID", y_col = None, batch_size = 16, shuffle = False, class_mode = None, target_size = (256, 256))
Found 1950 validated image filenames.
– We load the pre-trained base model of the ‘REsNet50’ network using the imagenet weights:
basemodel = ResNet50(weights = 'imagenet', include_top = False, input_tensor = Input(shape=(256,256,3))) basemodel.summary()
Model: "resnet50" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 256, 256, 3) 0 __________________________________________________________________________________________________ conv1_pad (ZeroPadding2D) (None, 262, 262, 3) 0 input_1[0][0] __________________________________________________________________________________________________ conv1_conv (Conv2D) (None, 128, 128, 64) 9472 conv1_pad[0][0] __________________________________________________________________________________________________ conv1_bn (BatchNormalization) (None, 128, 128, 64) 256 conv1_conv[0][0] __________________________________________________________________________________________________ conv1_relu (Activation) (None, 128, 128, 64) 0 conv1_bn[0][0] __________________________________________________________________________________________________ pool1_pad (ZeroPadding2D) (None, 130, 130, 64) 0 conv1_relu[0][0] __________________________________________________________________________________________________ pool1_pool (MaxPooling2D) (None, 64, 64, 64) 0 pool1_pad[0][0] __________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D) (None, 64, 64, 64) 4160 pool1_pool[0][0] __________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 64, 64, 64) 256 conv2_block1_1_conv[0][0] __________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 64, 64, 64) 0 conv2_block1_1_bn[0][0] ................................................................................................................ ................................................................................................................ ................................................................................................................ conv5_block3_2_relu (Activation (None, 8, 8, 512) 0 conv5_block3_2_bn[0][0] __________________________________________________________________________________________________ conv5_block3_3_conv (Conv2D) (None, 8, 8, 2048) 1050624 conv5_block3_2_relu[0][0] __________________________________________________________________________________________________ conv5_block3_3_bn (BatchNormali (None, 8, 8, 2048) 8192 conv5_block3_3_conv[0][0] __________________________________________________________________________________________________ conv5_block3_add (Add) (None, 8, 8, 2048) 0 conv5_block2_out[0][0] conv5_block3_3_bn[0][0] __________________________________________________________________________________________________ conv5_block3_out (Activation) (None, 8, 8, 2048) 0 conv5_block3_add[0][0] ================================================================================================== Total params: 23,587,712 Trainable params: 23,534,592 Non-trainable params: 53,120 __________________________________________________________________________________________________
– Freezing the model weights:
for layer in basemodel.layers: layers.trainable = False
headmodel = basemodel.output headmodel = AveragePooling2D(pool_size = (4,4))(headmodel) headmodel = Flatten(name= 'flatten')(headmodel) headmodel = Dense(256, activation = "relu")(headmodel) headmodel = Dropout(0.3)(headmodel) headmodel = Dense(1, activation = 'sigmoid')(headmodel) model = Model(inputs = basemodel.input, outputs = headmodel) model.compile(loss = 'binary_crossentropy', optimizer='Nadam', metrics= ["accuracy"])
– We can use the ‘early stop’ to stop the training to avoid overfitting (if the validation loss does not go down after a certain number of epochs):
earlystopping = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20) # We keep the model with the least validation error checkpointer = ModelCheckpoint(filepath="resnet-weights.hdf5", verbose=1, save_best_only=True)
# we careful, this step lasts at least 90min (in our PC) history = model.fit_generator(train_generator, steps_per_epoch= train_generator.n // 16, epochs = 40, validation_data= valid_generator, validation_steps= valid_generator.n // 16, callbacks=[checkpointer, earlystopping])
– We save the architecture of the trained model for the future:
model_json = model.to_json() with open("resnet-classifier-model.json","w") as json_file: json_file.write(model_json)
5 – Evaluate the effectiveness of the model
with open('resnet-classifier-model.json', 'r') as json_file: json_savedModel= json_file.read() # loading the model model = tf.keras.models.model_from_json(json_savedModel) model.load_weights('weights.hdf5') model.compile(loss = 'binary_crossentropy', optimizer='Nadam', metrics= ["accuracy"])
– We make the prediction:
from keras_preprocessing.image import ImageDataGenerator test_predict = model.predict(test_generator, steps = test_generator.n // 16, verbose =1)
- Since we use at the end the sigmoid activation function, our result contains continuous values from 0 to 1.
- The network is initially used to classify whether the image is defective or not
- These defective images are then passed through the segmentation network to obtain the location and type of defect.
- We’re going to choose 0.01, to make sure we skip the images so they don’t go through the segmentation network unless
- That it does not have any defect and if we are not sure, we can pass this image through the segmentation network.
predict = [] for i in test_predict: if i < 0.01: #0.5 predict.append(0) else: predict.append(1) predict = np.asarray(predict) len(predict)
1936
# we used the test generator, it limited the images to 1936, due to batch size original = np.asarray(test.label)[:1936] len(original)
1936
– We look for the accuracy of the model:
from sklearn.metrics import accuracy_score accuracy = accuracy_score(original, predict) accuracy
0.8693181818181818
– Matrix Confusion and classification report:
from sklearn.metrics import confusion_matrix cm = confusion_matrix(original, predict) plt.figure(figsize = (7,7)) sns.heatmap(cm, annot=True)

– Printing classification report:
from sklearn.metrics import classification_report report = classification_report(original,predict, labels = [0,1]) print(report)
precision recall f1-score support 0 1.00 0.72 0.83 889 1 0.81 1.00 0.89 1047 accuracy 0.87 1936 macro avg 0.90 0.86 0.86 1936 weighted avg 0.89 0.87 0.87 1936
- We have a good precision for the defects (0,81)
6 – Build a segmentation model with ResUNet
Source: https://github.com/nikhilroxtomar/Deep-Residual-Unet
from sklearn.model_selection import train_test_split X_train, X_val = train_test_split(defect_df, test_size=0.2)
– Create separate list to pass to generator for imageId, classId and RLE:
train_ids = list(X_train.ImageId) train_class = list(X_train.ClassId) train_rle = list(X_train.EncodedPixels) val_ids = list(X_val.ImageId) val_class = list(X_val.ClassId) val_rle = list(X_val.EncodedPixels)
– Creating images generator:
from utilities import DataGenerator training_generator = DataGenerator(train_ids,train_class, train_rle, train_dir) validation_generator = DataGenerator(val_ids,val_class,val_rle, train_dir)
def resblock(X, f): # Entry copy X_copy = X # Main Path # https://medium.com/@prateekvishnu/xavier-and-he-normal-he-et-al-initialization-8e3d7a087528 X = Conv2D(f, kernel_size = (1,1), strides = (1,1), kernel_initializer ='he_normal')(X) X = BatchNormalization()(X) X = Activation('relu')(X) X = Conv2D(f, kernel_size = (3,3), strides =(1,1), padding = 'same', kernel_initializer ='he_normal')(X) X = BatchNormalization()(X) # Short Path # https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33 X_copy = Conv2D(f, kernel_size = (1,1), strides =(1,1), kernel_initializer ='he_normal')(X_copy) X_copy = BatchNormalization()(X_copy) # We add the output file from the combination of main and short path X = Add()([X,X_copy]) X = Activation('relu')(X) return X
– Create a upscale function and join the values:
def upsample_concat(x, skip): x = UpSampling2D((2,2))(x) merge = Concatenate()([x, skip]) return merge
input_shape = (256,256,1) #Input tensor shape X_input = Input(input_shape) #Stage 1 conv1_in = Conv2D(16,3,activation= 'relu', padding = 'same', kernel_initializer ='he_normal')(X_input) conv1_in = BatchNormalization()(conv1_in) conv1_in = Conv2D(16,3,activation= 'relu', padding = 'same', kernel_initializer ='he_normal')(conv1_in) conv1_in = BatchNormalization()(conv1_in) pool_1 = MaxPool2D(pool_size = (2,2))(conv1_in) #Stage 2 conv2_in = resblock(pool_1, 32) pool_2 = MaxPool2D(pool_size = (2,2))(conv2_in) #Stage 3 conv3_in = resblock(pool_2, 64) pool_3 = MaxPool2D(pool_size = (2,2))(conv3_in) #Stage 4 conv4_in = resblock(pool_3, 128) pool_4 = MaxPool2D(pool_size = (2,2))(conv4_in) #Stage 5 conv5_in = resblock(pool_4, 256) #Upscale stage 1 up_1 = upsample_concat(conv5_in, conv4_in) up_1 = resblock(up_1, 128) #Upscale stage 2 up_2 = upsample_concat(up_1, conv3_in) up_2 = resblock(up_2, 64) #Upscale stage 3 up_3 = upsample_concat(up_2, conv2_in) up_3 = resblock(up_3, 32) #Upscale stage 4 up_4 = upsample_concat(up_3, conv1_in) up_4 = resblock(up_4, 16) #Final Output output = Conv2D(4, (1,1), padding = "same", activation = "sigmoid")(up_4) model_seg = Model(inputs = X_input, outputs = output )
Loss function
Source: https://github.com/nabsabraham/focal-tversky-unet/blob/master/losses.py
– We need a custom loss function to train this ResUNet:
from utilities import focal_tversky, tversky_loss, tversky adam = tf.keras.optimizers.Adam(lr = 0.05, epsilon = 0.1) model_seg.compile(optimizer = adam, loss = focal_tversky, metrics = [tversky]) # use to exit training the 'early stop' if validation loss does not decrease even after certain epochs (be patient) earlystopping = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20) # Keep the best model with the least loss of validation checkpointer = ModelCheckpoint(filepath="resunet-segmentation-weights.hdf5", verbose=1, save_best_only=True)
– We save the model for future in .json file:
model_json = model_seg.to_json() with open("resunet-segmentation-model.json","w") as json_file: json_file.write(model_json)
7 – The effectiveness of the trained segmentation model
from utilities import focal_tversky, tversky_loss, tversky with open('resunet-segmentation-model.json', 'r') as json_file: json_savedModel= json_file.read() # Load the model model_seg = tf.keras.models.model_from_json(json_savedModel) model_seg.load_weights('weights_seg.hdf5') adam = tf.keras.optimizers.Adam(lr = 0.05, epsilon = 0.1) model_seg.compile(optimizer = adam, loss = focal_tversky, metrics = [tversky])
– Test data for the segmentation task:
test_df = pd.read_csv('test.csv') test_df
[table id=56 /]
test_df.ImageId
0 0ca915b9f.jpg 1 7773445b7.jpg 2 5e0744d4b.jpg 3 6ccde604d.jpg 4 16aabaf79.jpg ... 633 a4334d7da.jpg 634 418e47222.jpg 635 817a545aa.jpg 636 caad490a5.jpg 637 a5e9195b6.jpg Name: ImageId, Length: 638, dtype: object
– Prediction:
from utilities import prediction image_id, defect_type, mask = prediction(test_df, model, model_seg)
– We create the dataframe for the result:
df_pred= pd.DataFrame({'ImageId': image_id,'EncodedPixels': mask,'ClassId': defect_type}) df_pred.head()
[table id=57 /]
– We are going to show the images together with their original masks (ground truth):
# Vamos a mostrar las imágenes junto con sus máscaras originales (ground truth) for i in range(10): # read the images using opencv and convert them to rgb format img = io.imread(os.path.join(train_dir,test_df.ImageId[i])) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Get mask for rle image mask = rle2mask(test_df.EncodedPixels[i],img.shape[0],img.shape[1]) img[mask == 1,1] = 255 plt.figure() plt.title(test_df.ClassId[i]) plt.imshow(img)
– Visualize the results (model predictions):
directory = "train_images" for i in range(10): # read the images using opencv and convert them to rgb format img = io.imread(os.path.join(directory,df_pred.ImageId[i])) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Get mask for rle image mask = rle2mask(df_pred.EncodedPixels[i],img.shape[0],img.shape[1]) img[mask == 1,0] = 255 plt.figure() plt.title(df_pred.ClassId[i]) plt.imshow(img)