Skip to content

Commit adbb48c

Browse files
committed
changed readme file
1 parent 0fd6579 commit adbb48c

File tree

5 files changed

+33
-1
lines changed

5 files changed

+33
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
### Dataset
2+
The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset: https://www.kaggle.com/datasets/adityajn105/flickr8k
3+
4+
### Steps to work with dataset
5+
* If you want to work on the project locally then download and extract the dataset under these folder.
6+
* If you want to use any other platform like google colab or kaggle follow the appropriate guidlines to import the dataset.
7+
Loading
Loading

Image Caption Generator/Readme.md

+26-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
1+
# Image Caption Generator
2+
## Codepeak-2023 /World-of-ML/DL-Simplified#361
13

2-
### Approach to the problem statement
4+
5+
Full name : Onkar Viralekar
6+
7+
GitHub Profile Link : https://github.com/onkar-1432
8+
9+
10+
11+
12+
What is your participant role? Codepeak23
13+
14+
**Approach to the problem statement** :
315
The encoder-decoder image captioning system would encode the image, using a pre-trained Convolutional Neural Network that would produce a hidden state. Then, it would decode this hidden state by using LSTM mechanism and generate a caption.
416

517
But RNNs tend to be computationally expensive to train and evaluate, so in practice, memory is limited to just a few elements. Attention models can help address this problem by selecting the most relevant elements from an input image.
@@ -11,6 +23,17 @@ The image is first divided into n parts,when the RNN is generating a new word, t
1123
### Dataset
1224
The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset: https://www.kaggle.com/datasets/adityajn105/flickr8k
1325

26+
### MODELS USED
27+
ResNet-50 is a deep neural network architecture that incorporates residual blocks, introducing shortcut connections to facilitate the training of very deep networks. With 50 layers, ResNet-50 employs a bottleneck design, utilizing 1x1, 3x3, and 1x1 convolutions in sequence. Frequently pretrained on large-scale image classification tasks like ImageNet. ResNet-50 serves as a powerful feature extractor and is widely used for transfer learning in diverse computer vision applications. Its availability in popular deep learning frameworks, along with model zoos housing pretrained versions, enables researchers and practitioners to easily leverage its capabilities for tasks such as image classification, object detection, and segmentation. ResNet-50's impact extends beyond its architecture, influencing the design of subsequent deep neural networks and contributing to the success of deep learning in computer vision.
28+
29+
### ACCURACIES
30+
The accuracies of the models used in this project are as follows:
31+
32+
- CNN- RNN attention mechanism:
33+
- Training Accuracy: 96.00%
34+
![Loss Plot]( Image/Screenshot%20from%202023-12-24%2015-59-25.png)
35+
![Accuracy Plot](Image/Screenshot%20from%202023-12-24%2014-59-36.png)
36+
1437

1538
### Libraries used:
1639
1. numpy.
@@ -25,5 +48,7 @@ The dataset for this project is taken from the Kaggle dataset website. Here is t
2548
10. tqdm
2649

2750
### Author
51+
52+
Full Name : Onkar Viralekar
2853
Code contributed by: **Onkar Viralekar**
2954
- Check out my GitHub Profile: https://github.com/onkar-1432

0 commit comments

Comments
 (0)