Skip to content

Commit 9948f50

Browse files
Update README.md
1 parent 2bc6816 commit 9948f50

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,47 @@
11
# Simple-Search-Engine
22
The Simple Search Engine project is a lightweight yet powerful tool designed to provide efficient text search capabilities. this project allows users to search through a corpus of documents and obtain relevant result.
3+
The Vector Space Model (VSM) is a crucial concept in Natural Language Processing (NLP) used to represent text data numerically in a high-dimensional space. This project implements a search engine using the VSM approach, allowing users to retrieve relevant information from a given corpus.
4+
5+
## Description
6+
7+
This search engine project involves several key steps:
8+
9+
### Step 0: Importing Corpus
10+
11+
The initial step involves reading text corpora from the local machine. The Python script utilizes the NLTK library for further processing.
12+
13+
### Step 1: Preprocessing & Tokenizing
14+
15+
Text preprocessing is carried out to eliminate unnecessary tokens and simplify calculations. The NLTK library is employed for tasks such as tokenization, lemmatization, and stop-word removal.
16+
17+
### Step 2: Creating our Dataset
18+
19+
The preprocessed data is organized into a CSV file, creating a structured dataset for subsequent analysis.
20+
21+
### Step 3: Creating our Matrix
22+
23+
A term-document matrix is generated from the dataset, representing the frequency of terms in each document.
24+
25+
### Step 5: Calculating Cosine Similarity
26+
27+
Cosine similarity is computed to measure the similarity between the input query and the documents in the corpus. The results are ranked based on similarity.
28+
29+
## How to Use
30+
31+
1. Clone the repository to your local machine.
32+
2. Install the necessary dependencies (NLTK, pandas).
33+
3. Run the Python script to build the search engine.
34+
35+
## Dependencies
36+
37+
- NLTK
38+
- Pandas
39+
40+
## Author
41+
42+
[Kiarash Rahmani]
43+
44+
## License
45+
46+
This project is licensed under the [MIT License](LICENSE).
47+

0 commit comments

Comments
 (0)