Skip to content

vit-mini-explicit-content is an image classification vision-language model fine-tuned from vit-base-patch16-224-in21k for a single-label classification task. It categorizes images based on their explicitness using the ViTForImageClassification architecture.

License

Notifications You must be signed in to change notification settings

greatLiiAccount/vit-mini-explicit-content

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vit-mini-explicit-content 🖼️🔍

GitHub Repo Size Last Commit License

Welcome to the vit-mini-explicit-content repository! This project focuses on image classification using a vision-language model fine-tuned from vit-base-patch16-224-in21k. The model is designed for a single-label classification task, specifically categorizing images based on their explicitness. It utilizes the ViTForImageClassification architecture, ensuring high performance in classifying images effectively.

Table of Contents

  1. Features
  2. Installation
  3. Usage
  4. Model Architecture
  5. Datasets
  6. Training
  7. Evaluation
  8. Contributing
  9. License
  10. Links

Features

  • Image Classification: Classifies images based on explicit content.
  • Fine-tuned Model: Built on a pre-trained ViT model for improved accuracy.
  • Single-label Classification: Each image is categorized into one explicitness class.
  • Open Source: Freely available for modification and distribution.

Installation

To get started with the vit-mini-explicit-content model, clone the repository and install the required packages. Use the following commands:

git clone https://github.com/onyemason/vit-mini-explicit-content.git
cd vit-mini-explicit-content
pip install -r requirements.txt

Usage

To use the model, load it and provide the image you want to classify. Here’s a simple example:

from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
import requests

# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('onyemason/vit-mini-explicit-content')
feature_extractor = ViTFeatureExtractor.from_pretrained('onyemason/vit-mini-explicit-content')

# Load an image
url = 'https://example.com/image.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform classification
outputs = model(**inputs)
predictions = outputs.logits.argmax(-1)
print(f'Predicted class: {predictions.item()}')

Model Architecture

The vit-mini-explicit-content model is based on the Vision Transformer (ViT) architecture. It uses self-attention mechanisms to process images, allowing the model to capture complex patterns and features. The architecture includes:

  • Patch Embedding: Divides the image into patches and projects them into a feature space.
  • Transformer Encoder: A stack of transformer layers that processes the embedded patches.
  • Classification Head: A linear layer that outputs class probabilities.

Datasets

The model is trained on a curated dataset that includes various explicitness categories. The dataset is designed to ensure diverse and representative samples, enhancing the model's ability to generalize across different contexts.

If you wish to contribute to the dataset, please ensure that the images are appropriate and comply with ethical guidelines.

Training

To train the model, follow these steps:

  1. Prepare your dataset in the required format.
  2. Use the provided training script to initiate training.

Example command:

python train.py --dataset path/to/dataset --epochs 10 --batch_size 32

Evaluation

After training, evaluate the model's performance using the validation set. You can use the evaluation script provided in the repository.

Example command:

python evaluate.py --model path/to/saved/model --validation_data path/to/validation_data

Contributing

We welcome contributions to the vit-mini-explicit-content project. If you have ideas for improvements or new features, please open an issue or submit a pull request. Ensure that your code follows the existing style and includes tests where applicable.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Links

For the latest releases and updates, please visit the Releases section. You can download the latest version of the model from there and execute it as needed.

To explore the repository further, check out the Releases section for the most recent updates and model versions.

Thank you for your interest in the vit-mini-explicit-content project! We hope you find it useful for your image classification tasks.

About

vit-mini-explicit-content is an image classification vision-language model fine-tuned from vit-base-patch16-224-in21k for a single-label classification task. It categorizes images based on their explicitness using the ViTForImageClassification architecture.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages