NCBI SRA Toolkit介绍

SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from “next-generation” sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer Cardinal Note
2.6.3 X These versions no longer support  downloading SRA data** but still can be used to process local data.
2.9.0 X
2.9.1 X
2.9.6 X* X*
2.10.7 X X
2.11.2 X X
3.0.2 X X X*
* Current default version
** NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Pitzer and Owens

Download SRA Data

NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

Set up the credentials (recommended)

Once you have obtained an AWS or GCP credential file, you can set the credentials by following these steps:

module load sratoolkit/2.11.2
vdb-config --report-cloud-identity yes 

# For GCP credentials
vdb-config --set-gcp-credentials /path/to/gcp/creddential/file

# For AWS credentials
vdb-config --set-aws-credentials /path/to/aws/creddential/file
Each version of the toolkit comes with its own set of configuration options. To modify the defaults, run vdb-config -i to access the interactive configuration. For additional information, please visit the following link: https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration.

You can now download SRA data using prefetch

prefetch SRR390728

The default download path is located in your home directory at ~/ncbi. For instance, if you’re looking for the SRA file SRR390728.sra, you can find it at ~/ncbi/sra, and the resource files can be found at ~/ncbi/refseq. You can use srapath to verify if the SRA accession is accessible in the download path

$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/sra/sra/SRR390728.sra

You can now run other SRA tools, such as fastq-dump, on computing nodes. Here is an example job script:

#!/bin/bash
#SBATCH --job-name use_fastq_dump
#SBATCH --time=0:10:0
#SBATCH --ntasks-per-node=1

module load sratoolkit/2.11.2
module list
fastq-dump -X 5 -Z SRR390728

Unfortunately, Home Directory file system is not optimized for handling heavy computations. If the SRA file is particularly large, you can change the default download path for SRA data to our scratch file system using one of the following two approaches. The following approaches use the /fs/scratch/PAS1234/johndoe/ncbi directory as an example.

Change the prefetch directory using vdb-config

module load sratoolkit/2.11.2
vdb-config -s /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi
prefetch SRR390728
srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra

Download to the current directory (available for version 2.10 or later)

module load sratoolkit/2.11.2
vdb-config --prefetch-to-cwd
mkdir -p /fs/scratch/PAS1234/johndoe/ncbi
cd /fs/scratch/PAS1234/johndoe/ncbi prefetch SRR390728 srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/SRR390728/SRR390728.sra

Known Issues

Error when downloading SRA data

NCBI now utilizes cloud-style object stores. To access SRA cloud data, please use version 2.10 or later and provide your AWS or GCP access credentials to vdb-config. For more information, please visit https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials. However, you can continue to use older versions to process SRA local data.

 

Further Reading

如若转载,请注明出处:https://www.ouq.net/3373.html

(0)
上一篇 01/02/2025 00:06
下一篇 01/16/2025 00:25

相关推荐

  • 本地部署DeepSeek教程

    本地部署DeepSeek的意义:企业用户>个人用户 不联网:数据隐私可保证 自己部署:随时可用 部署私有知识库:专属AI模型 本地部署DeepSeek的缺陷 质量差:本地比官方服更差 部署复杂:有一定操作难度 场景较少:个人用户部署价…

    机器学习 02/04/2025
    181
  • DeepSeek 的使用教程

    一、什么是 DeepSeek? DeepSeek 是一款专注于高效信息处理与智能交互的人工智能工具,支持文本生成、数据分析、代码编写、知识问答等功能。其核心能力包括: 自然语言对话:回答复杂问题、提供建议。 多场景应用:编程辅助、内容创作、…

    02/04/2025
    435
  • CS229 机器学习课程复习材料-概率论

    CS229 机器学习课程复习材料-概率论 概率论复习和参考 概率论是对不确定性的研究。通过这门课,我们将依靠概率论中的概念来推导机器学习算法。这篇笔记试图涵盖适用于CS229的概率论基础。概率论的数学理论非常复杂,并且涉及到“分析”的一个分…

    12/23/2024
    196
  • 机器学习:数学基础知识

    数学基础知识 高等数学 1.导数定义: 导数和微分的概念  (1) 或者:  (2) 2.左右导数导数的几何意义和物理意义 函数在处的左、右导数分别定义为: 左导数: 右导数: 3.函数的可导性与连续性之间的关系 Th1: 函数在处可微在处…

    机器学习 12/23/2024
    218
  • Alphafold3安装

    You will need a machine running Linux; AlphaFold 3 does not support other operating systems. Full installation requires …

    机器学习 12/09/2024
    893