How to build Sign Language Recognition Using CNN and OpenCV?

In this blog we will build a sign language detection model based on convolutional neural networks. If you want to read more about CNN read this blog.

Contents hide

1 1) Dataset

1.1 Loading the dataset to colab

1.2 Preprocessing

2 2) Build and Train the Model

3.5 Discover more from Arshad Kazi

To build a SLR (Sign Language Recognition) we will need three things:

Dataset
Model (In this case we will use a CNN)
Platform to apply our model (We are going to use OpenCV)

Training a deep neural network requires a powerful GPU. We will not need any powerful GPU for this project. But still it would be better to use online platforms like Google Colab. It’s free!

Looking for something advanced in Computer Vision? Check out these projects (stars represent perceived complexity):

★★☆☆☆ Multi-task Learning: GitHub Repo
★★☆☆☆ YOLO-NAS for any object detection: GitHub Repo
★★★★☆ 3D Image segmentation: GitHub Repo
★★★★★ 2D to 3D Human Pose estimation: GitHub Repo, Blog
★★★★★ Images to 3D Reder (NeRF): GitHub Repo

1) Dataset

We will use MNIST (Modified National Institute of Standards and Technology)dataset.

You can download the dataset here.

Basically, our dataset consists of many images of 24 (except J and Z) American Sign Language alphabets. Each image has size 28×28 pixel which means total 784 pixels per image.

Loading the dataset to colab

To load the dataset into colab use this code:


from keras.datasets import mnist
(X_train, Y_train) , (X_test , Y_test) = mnist.load_data()

Our dataset is in CSV(Comma-separated values) format. train_X and test_X contain the values of each pixel. train_Y and test_Y contain the label of image. You can use the following code to see the dataset:


display(X_train.info())
display(X_test.info())
display(X_train.head(n = 2))
display(X_test.head(n = 2))

Preprocessing

train_X and test_X consists of an array of all the pixel pixel values. We have to create an image from these values. Our image size is 28×28 hence we have to divide the array into 28×28 pixel groups. To do that we will use the following code:

X_train = np.array(X_train.iloc[:,:])
X_train = np.array([np.reshape(i, (28,28)) for i in X_train])
X_test = np.array(X_test.iloc[:,:])
X_test = np.array([np.reshape(i, (28,28)) for i in X_test])num_classes = 26
y_train = np.array(y_train).reshape(-1)
y_test = np.array(y_test).reshape(-1)y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]X_train = X_train.reshape((27455, 28, 28, 1))
X_test = X_test.reshape((7172, 28, 28, 1))

Now we can use this dataset to train our model.

2) Build and Train the Model

We will use CNN (Convolutional Neural Network) to recognise the alphabets. We are going to use keras.

If you are building this project then you should know how CNN works. If you are not familiar with CNN then I would highly recommend you this course Andrew Ng’s Convolutional Neural Networks on Coursera. Or you can follow my own blog from here.

Convolutional Neural Nets (CNNs) for Non Data Scientists

Here’s our model:


classifier = Sequential()
classifier.add(Conv2D(filters=8, kernel_size=(3,3),strides=(1,1),padding='same',input_shape=(28,28,1),activation='relu', data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size=(2,2)))
classifier.add(Conv2D(filters=16, kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))
classifier.add(Dropout(0.5))<br>classifier.add(MaxPooling2D(pool_size=(4,4)))
classifier.add(Dense(128, activation='relu'))<br>classifier.add(Flatten())
classifier.add(Dense(26, activation='softmax'))

As you can observe, like any other CNN our model consists of couple of Conv2D and MaxPooling layers followed by some fully connected layers (Dense).

The first Conv2D (Convolutional) layer takes input image of shape (28,28,1). The last fully connected layer gives us output for 26 alphabets.

We are using a Dropout after 2nd Conv2D layer to regularise our training.

We are using softmax activation function in the final layer. Which will give us probability for each alphabet as an output.

At the end our model looks like this:

We have to compile and fit our model. To do that we will use this:


classifier.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, epochs=50, batch_size=100)

We are using SGD optimiser to compile our model. You may decrease the epochs to 25.

Finally to check the accuracy we will use this:


accuracy = classifier.evaluate(x=X_test,y=y_test,batch_size=32)
print("Accuracy: ",accuracy[1])

Now to download the trained model on our PC we can use this:


classifier.save('CNNmodel.h5')
weights_file = drive.CreateFile({'title' : 'CNNmodel.h5'})
weights_file.SetContentFile('CNNmodel.h5')<br>weights_file.Upload()
drive.CreateFile({'id': weights_file.get('id')})

It will save the trained model to your drive.

3) OpenCV

Create a Window.

We have to create a window to take the input from our webcam. The image which we are taking as an input should be 28×28 grayscale image. Because we trained our model on 28×28 size image.

To create the window

def main():
    while True:  

       # capturing the image from webcam 
       cam_capture = cv2.VideoCapture(0)
       _, image_frame = cam_capture.read()
  
       # to crop required part
       im2 = crop_image(image_frame, 300,300,300,300)

       # convert to grayscale 
       image_grayscale = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
    
       # blurring the image 
       image_grayscale_blurred =cv2.GaussianBlur(image_grayscale, (15,15), 0)

       # resize the image to 28x28
       im3 = cv2.resize(image_grayscale_blurred, (28,28), interpolation = cv2.INTER_AREA)

       # expand the dimensions from 28x28 to 1x28x28x1
       im4 = np.resize(im3, (28, 28, 1))
       im5 = np.expand_dims(im4, axis=0)

Prediction

Now we have to predict the alphabet from the input image. Our model will give outputs as integers rather than alphabets that’s because the labels are given as integers (1 for A, 2 for B, 3 for C and so on..)

#model is our classifier and image is input image we are passing
def keras_predict(model, image):
    data = np.asarray( image, dtype="int32" )
    pred_probab = model.predict(data)[0]
    
    # softmax gives probability for all the alphabets hence we have to choose the maximum probability alphabet 
    pred_class = list(pred_probab).index(max(pred_probab))
    return max(pred_probab), pred_class

Our model’s accuracy is 94% so it should recognise alphabets without any problem with plain background and descent lights.

Done!

The entire project is available at:

https://github.com/Arshad221b/Sign-Language-Recognition-

Do you have any questions or suggestions? Feel free to reach out to me on [email protected] or hit me anytime on LinkedIn or Twitter!

Discover more from Arshad Kazi

Subscribe to get the latest posts sent to your email.

4 Comments on “How to build Sign Language Recognition Using CNN and OpenCV?”

yash p says:

April 29, 2024 at 11:47 am

hello arshad i am in last year of bca and its the time of project submission
i would like to use your project and use it for my last year project
will you help me to run ,install library .. of your project ???

- Arshad Kazi says:
  
  May 7, 2024 at 6:14 pm
  
  Hope this not too late,
  
  Go ahead!! I have also uploaded PyTorch Version which is latest version on the repo. Please check the GitHub repo. 🙂
  
Disha Sareen says:

June 6, 2024 at 11:41 pm

Hi Arshad,
Thanks for making the project.

I downloaded your CNNModel.h5 and also the py script.
Unfortunately the accuracy in the model shows 94% but when I run it, it is more like 10% accuracy.
I would love for you to guide me on my project.

Kindly give me a call back at ###########.

Awaiting your reply!

- Arshad Kazi says:
  
  June 7, 2024 at 7:35 pm
  
  Ohh that’s strange. Though I would highly recommend training the model from scratch instead of using those weights. Also use PyTorch version as a reference (updated version). You can find that in the repo here: https://github.com/Arshad221b/Sign-Language-Recognition/blob/master/sign_language_pytorch.ipynb
  
  If you using custom data, you will need to fine-tune those parameters 🙂
  Please contact me on [email protected] if you have further doubts!

How to build Sign Language Recognition Using CNN and OpenCV?

1) Dataset

Loading the dataset to colab

Preprocessing

2) Build and Train the Model

3) OpenCV

Create a Window.

Prediction

Share this:

Like this:

Discover more from Arshad Kazi

4 Comments on “How to build Sign Language Recognition Using CNN and OpenCV?”

Leave a Reply/Feedback :)Cancel reply

Discover more from Arshad Kazi