You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset: https://www.kaggle.com/datasets/adityajn105/flickr8k
3
+
4
+
### Steps to work with dataset
5
+
* If you want to work on the project locally then download and extract the dataset under these folder.
6
+
* If you want to use any other platform like google colab or kaggle follow the appropriate guidlines to import the dataset.
The encoder-decoder image captioning system would encode the image, using a pre-trained Convolutional Neural Network that would produce a hidden state. Then, it would decode this hidden state by using LSTM mechanism and generate a caption.
4
16
5
17
But RNNs tend to be computationally expensive to train and evaluate, so in practice, memory is limited to just a few elements. Attention models can help address this problem by selecting the most relevant elements from an input image.
@@ -11,6 +23,17 @@ The image is first divided into n parts,when the RNN is generating a new word, t
11
23
### Dataset
12
24
The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset: https://www.kaggle.com/datasets/adityajn105/flickr8k
13
25
26
+
### MODELS USED
27
+
ResNet-50 is a deep neural network architecture that incorporates residual blocks, introducing shortcut connections to facilitate the training of very deep networks. With 50 layers, ResNet-50 employs a bottleneck design, utilizing 1x1, 3x3, and 1x1 convolutions in sequence. Frequently pretrained on large-scale image classification tasks like ImageNet. ResNet-50 serves as a powerful feature extractor and is widely used for transfer learning in diverse computer vision applications. Its availability in popular deep learning frameworks, along with model zoos housing pretrained versions, enables researchers and practitioners to easily leverage its capabilities for tasks such as image classification, object detection, and segmentation. ResNet-50's impact extends beyond its architecture, influencing the design of subsequent deep neural networks and contributing to the success of deep learning in computer vision.
28
+
29
+
### ACCURACIES
30
+
The accuracies of the models used in this project are as follows:
0 commit comments