This tool analyzes DNA sequences from FASTA or FASTQ files, calculating and plotting nucleotide composition statistics. It's designed to work with yeast genome sequences but can be used with any DNA sequence file.
-
Make sure you have Python 3 installed on your computer. You can download it from python.org.
- if you have a Mac, Python comes preinstalled
-
Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux)
-
Create a new folder for this project and copy these files into it (if you already have all of the project files in a folder, i.e. you unzipped a file that your fiance sent you, you can skip this step):
analyze_sequence.py
requirements.txt
-
Navigate to your project folder in the terminal:
cd path/to/your/folder
-
Install the required packages:
pip install -r requirements.txt
-
Make the script executable (Mac/Linux only):
chmod +x analyze_sequence.py
The script takes a DNA sequence file (FASTA or FASTQ format) and analyzes it using a sliding window approach.
python analyze_sequence.py -f your_sequence.fastq
-f
or--file
: Your sequence file (required)-w
or--window
: Window size (default: 300)-s
or--step
: Step size (default: 5)--start
: Start position for analysis (optional)--end
: End position for analysis (optional)
-
Basic analysis with default settings:
python analyze_sequence.py -f sequence.fastq
-
Change window size to 200 and step size to 50:
python analyze_sequence.py -f sequence.fastq -w 200 -s 50
-
Analyze only positions 1000 to 2000:
python analyze_sequence.py -f sequence.fastq --start 1000 --end 2000
The script creates two files:
- A PNG file with the plot
- A CSV file with the raw data
The output files will be named based on your input parameters (window size, step size, and position range).
The plot shows two measurements:
- Black dots: The proportion of G+T bases in each window
- Red triangles: The ratio of G to T bases in each window
-
If you get "command not found":
- Make sure you're in the correct directory
- Try using
python analyze_sequence.py
instead of./analyze_sequence.py
-
If you get import errors:
- Make sure you've installed the requirements:
pip install -r requirements.txt
- Make sure you've installed the requirements:
-
If your file isn't found:
- Make sure you're using the correct path to your sequence file
- Check that the file exists and has the correct permissions
If you encounter any issues:
- Check that your sequence file is in FASTA or FASTQ format
- Verify that all command line arguments are correct
- Make sure you've followed all setup instructions