abhisheks008
diff --git a/‎Image Caption Generator/Dataset/Readme.md
+7 b/‎Image Caption Generator/Dataset/Readme.md
+7
diff --git a/‎Image Caption Generator/Image/Screenshot from 2023-12-24 14-59-36.png
268 KB b/‎Image Caption Generator/Image/Screenshot from 2023-12-24 14-59-36.png
268 KB
diff --git a/‎Image Caption Generator/Image/Screenshot from 2023-12-24 15-59-25.png
17.5 KB b/‎Image Caption Generator/Image/Screenshot from 2023-12-24 15-59-25.png
17.5 KB
diff --git a/‎Image Caption Generator/Model/image-caption-gen.ipynb renamed to ‎Image Caption Generator/Model/image_caption_generator.ipynb b/‎Image Caption Generator/Model/image-caption-gen.ipynb renamed to ‎Image Caption Generator/Model/image_caption_generator.ipynb
diff --git a/‎Image Caption Generator/Readme.md
+26-1 b/‎Image Caption Generator/Readme.md
+26-1
@@ -0,0 +1,7 @@
+### Dataset
+The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset:  https://www.kaggle.com/datasets/adityajn105/flickr8k
+
+### Steps to work with dataset
+* If you want to work on the project locally then download and extract the dataset under these folder.
+* If you want to use any other platform like google colab or kaggle follow the appropriate guidlines to import the dataset.
+
@@ -1,5 +1,17 @@
+# Image Caption Generator 
+## Codepeak-2023 /World-of-ML/DL-Simplified#361
 
-###  Approach to the problem statement
+
+Full name : Onkar Viralekar
+
+GitHub Profile Link : https://github.com/onkar-1432
+
+Email ID : [email protected]
+
+
+What is your participant role? Codepeak23
+
+**Approach to the problem statement** :
 The encoder-decoder image captioning system would encode the image, using a pre-trained Convolutional Neural Network that would produce a hidden state. Then, it would decode this hidden state by using  LSTM mechanism and generate a caption.
 
 But RNNs tend to be computationally expensive to train and evaluate, so in practice, memory is limited to just a few elements. Attention models can help address this problem by selecting the most relevant elements from an input image.
@@ -11,6 +23,17 @@ The image is first divided into n parts,when the RNN is generating a new word, t
 ### Dataset
 The dataset for this project is taken from the Kaggle dataset website. Here is the link for the dataset:  https://www.kaggle.com/datasets/adityajn105/flickr8k
 
+### MODELS USED
+ResNet-50 is a deep neural network architecture that incorporates residual blocks, introducing shortcut connections to facilitate the training of very deep networks. With 50 layers, ResNet-50 employs a bottleneck design, utilizing 1x1, 3x3, and 1x1 convolutions in sequence. Frequently pretrained on large-scale image classification tasks like ImageNet. ResNet-50 serves as a powerful feature extractor and is widely used for transfer learning in diverse computer vision applications. Its availability in popular deep learning frameworks, along with model zoos housing pretrained versions, enables researchers and practitioners to easily leverage its capabilities for tasks such as image classification, object detection, and segmentation. ResNet-50's impact extends beyond its architecture, influencing the design of subsequent deep neural networks and contributing to the success of deep learning in computer vision.
+
+### ACCURACIES
+The accuracies of the models used in this project are as follows:
+
+- CNN- RNN attention mechanism:
+  - Training Accuracy: 96.00%
+![Loss Plot]( Image/Screenshot%20from%202023-12-24%2015-59-25.png)
+![Accuracy Plot](Image/Screenshot%20from%202023-12-24%2014-59-36.png)
+
 
 ### Libraries used:
 1. numpy.
@@ -25,5 +48,7 @@ The dataset for this project is taken from the Kaggle dataset website. Here is t
 10. tqdm
 
 ### Author
+
+Full Name : Onkar Viralekar
 Code contributed by: **Onkar Viralekar**
 - Check out my GitHub Profile: https://github.com/onkar-1432