SARS-CoV-2 Sequencing Resources

栏目: IT技术 · 发布时间: 6年前

内容简介：This document repository is meant to serve as the start of a crowd-sourced collection of information, documentation, protocols and other resources for public health laboratories intending to sequence SARS-CoV-2 coronavirus samples in the coming weeks. This

SARS-CoV-2 Sequencing Resources

This document repository is meant to serve as the start of a crowd-sourced collection of information, documentation, protocols and other resources for public health laboratories intending to sequence SARS-CoV-2 coronavirus samples in the coming weeks. This is admittedly a limited first draft, but will continued to collate useful information as additional protocols, tools, and resources are added, and as best practices are identified. While some of the resources here are directed specifically to US state and local public health laboratories in support of diagnostic testing, sequencing and response, we hope that this is a useful resource for the global laboratory community, as we respond to this pandemic threat.

This collection is maintained and curated by Duncan MacCannell from the Office of Advanced Molecular Detection ( AMD ) at the Centers for Disease Control and Prevention ( CDC ). Please feel free to suggest additions, edits, clarifications and corrections -- either by posting an issue, filing a pull request or by contacting me directly by email or twitter. In the meantime, I'll continue to add and mirror useful resources here as they become available.

INDEX

Bioinformatic Tools, Scripts and Workflows
Submitting to Public Sequence Repositories
Linking Sequence Accessions
Other Useful References and Resources
Notices and Disclaimers

Disclaimer

The findings and conclusions in this document and the attendant repository are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention. Use of trade names is for identiﬁcation only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services.

Sequencing Protocols

The following sequencing protocols, checklists and job-aids are primarily designed for the Oxford Nanopore MinION , and have been kindly shared by research groups throughout the world (please see individual protocols for attribution and citing purposes). Even so, most of these protocols should scale to larger ONT instruments without significant modifications.

a) CDC NCIRD/DVD ONT Sequencing Protocol

This protocol was developed, tuned and validated by the Viral Discovery laboratory at CDC/NCIRD, where it was used to generate the first 16 SARS-CoV-2 genome sequences from the United States. In practice, it has been used for situations with a relatively low or predictable volume of samples, and is often used in conjunction with Sanger-based tiling to resolve any potential sequencing or assembly issues.

Release pending CDC clearance.

b) ARTIC Network nCoV-2019 Sequencing Protocol

This protocol was developed and released by the fine folks at ARTIC Network , and was subsequently refined based on comments from Itokawa et al , which identified potential issues and proposed an alternate L18 primer.

Sequencing protocol / Single sample sequencing protocol
Stepwise simplified protocol from ONT*
Primer schemes: V1 / V2 (ref)
Integrated bioinformatics (RAMPART) - documentation below under bioinformatics methods.

c) Doherty Institute VIDRL Sequencing Protocols

The Victorian Infectious Diseases Reference Laboratory ( VIDRL ) at the Peter Doherty Institute for Infection and Immunity released two protocols for the ONT MinION, which they successfully used to sequence early Australian SARS-CoV-2 samples.

2. ILLUMINA

a) Illumina Nextera Flex Enrichment Sequencing Protocol

Illumina's Research and Development group has recently developed and validated a custom, research use only (RUO) enrichment sequencing strategy based on their Nextera Flex chemistry.

Release imminent.

b) SARS-CoV-2 Enrichment Sequencing by Spiked Primer MSSPE (NIAID/UCSF/CZBioHub)

The NIAID laboratory team in Cambodia, in collaboration with UCSF, CZBioHub and IPC, has released a metagenomic sequencing with spiked primer enrichment (MSSPE) protocol for SARS-CoV-2. The protocol is available on protocols.io .

MSSPE Protocol

c) Illumina Shotgun Metagenomics Sequencing Protocol

Illumina's technical note on sequencing coronavirus samples using a comprehensive metagenomic sequencing approach was one of the earlier protocols released for SARS-CoV-2, and remains an effective option for shotgun sequencing.

Application Note

d) SARS-CoV-2 and related virus sequencing with capture enrichment (Broad Institute)

The Sabeti lab, at the Broad Institute, released a probe set for comprehensive whole-genome capture of SARS-CoV-2 and respiratory-related viruses (human-infecting coronaviruses, HRSV, HMPV, HPIVs, Human mastadenovirus A-G, Enterovirus A-E, Rhinovirus A/B/C, influenza A/B/C). The probe set is available as V-Respiratory on the probe designs page of the CATCH repository. It was initially released in January, 2020 and most recently updated in March, 2020. Probes can be ordered from Twist Bioscience; we have used the protocol for Twist custom panels with slight modifications for low input Nextera XT libraries.

3. TANDEM

a) SARS-CoV-2 Parallel Sequencing by Illumina and ONT (UWMadison ZEST)

Staff and students from Thomas Friedrich and Dave O'Connor's laboratories at UWMadison have put together a tandem sequencing protocol and bioinformatic workflow that incorporates Illumina and ONT sequence. While this may be overkill for routine or high-throughput public health purposes, the necessary protocols, scripts and documentation are available here.

4. SANGER

a) CDC NCIRD/DVD Sanger Tiling

An elegant approach from a more civilized age. The Viral Discovery laboratory team in CDC/NCIRD/DVD has used conventional Sanger sequencing to refine and complement betacoronavirus sequencing on next generation platforms.

Release pending CDC clearance.

Bioinformatic Tools, Scripts and Workflows

1. CDC NCIRD/DVD Bioinformatics SOPs

This section describes the basic bioinformatic workflow that the Viral Discovery laboratory in NCIRD, and other teams at CDC use for quality assessment, assembly and comparison of coronavirus sequences. IRMA, the Iterative Refinement Meta-Assembler developed by CDC's Influenza Division for routine influenza surveillance, has recently been updated to support both ebolavirus and coronavirus assembly tasks. While IRMA isn't used for all SARS-CoV-2 assemblies at CDC, it is a powerful tool for complex or problematic samples and datasets.

Release of scripts and other tools is pending CDC clearance.
IRMA: Iterative Refinement Meta-Assembler is available here.

2. CLCbio Genomics Workbench

QIAGEN has released example workflows and tutorials for analyzing Illumina and Oxford Nanopore SARS-CoV-2 sequence data using CLC Genomics Workbench v20.0.3. Note - these workflows are "Research Use Only" (RUO), and may need to modified to fit upstream protocols. Free temporary licenses for CLC GWB and IPA are available, as well as a series of webinars and tutorials are available to familiarize users with the workflows. Jonathan Jacobs and Leif Schauser are available for user support and specific questions.

3. ARTIC Network Bioinformatics

The ARTIC Network has released detailed instructions on how to setup and configure the conda environment needed to run their analysis pipelines. These are complete bioinformatic workflows, including runtime visualization, basecalling, mapping/assembly and reporting in a single, portable environment. The artic-nCoV2019 repo includes source code and build instructions for a custom RAMPART configuration. Additional instructions and documentation are available below.

4. One Codex

One Codex has added support to its analysis platform for analyzing SARS-CoV-2 samples. This analysis ( example ) will be automatically run on any samples with SARS-CoV-2 reads. One Codex is making analysis of SARS-CoV-2 samples available free of charge for all users sharing their results and data publicly. Additional information and documentation are available below:

5. Broad viral-ngs tools

The Broad Institute's viral genomics analysis tools can assist with assembly, metagenomics, QC, and NCBI submission prep, for Illumina-generated data on viral genomes. It is available in the following forms:

The Terra cloud platform ( workspace including example SARS-CoV-2 data from SRA, blog post , getting started )
The DNAnexus cloud platform ( workflows )
The Dockstore tool repository service - integrates with several cloud platforms, or download to run on-prem ( workflows )
Github ( workflows )

The tools include:

denovo and reference based assembly
short read alignment and coverage plots
krakenuniq metagenomic classification
NCBI: SRA download, Genbank annotation download, Genbank submission prep
multiple alignment of genomes w/MAFFT
Illumina basecalling & demux, metrics, fastQC, ERCC spike-in counter

6. Genome Detective Virus tool

Genome Detective virus tool does QC, assembly and identification of SARS-CoV-2 from a wide range of sequencing protocols (metagenomic or targeted sequencing).

Raw sequence read files (FASTQ) can be uploaded directly in this web-based tool , and consensus sequences can be subsequently analyzed by the the Coronavirus Typing Tool .

Example output:

7. CosmosID

CosmosID has recently posted a blog entry on their site, describing how to use their web-based analysis platform to analyze SARS-CoV-2 data.

Detection of SARS-CoV-2 Coronavirus using CosmosID

8. ARTIC on Illumina Bioinformatic Workflow

Erin Young and Kelly Oakeson at the Utah Department of Health have outlined their bioinformatics approach for SARS-CoV-2 sequences using ARTIC primers, sequenced on Illumina.

Quality Management

This section will describe best practices for laboratory and bioinformatic quality assurance, including preflight checks for sequence and metadata submission to public repositories.

to do : I'd love for people to help describe their actual QC processes.

Submitting to Public Sequence Repositories

Sequence naming conventions for public repositories

We are proposing simplified naming conventions for sequences submitted to GISAID and NCBI from US public health and clinical laboratories.

COUNTRY	/ STATE-LAB-SAMPLE /	YEAR
`USA`	`/ CA-CDPH-S1 /`	`2020`
`USA`	`/ UT-01 /`	`2020`
`USA`	`/ AZ-1045 /`	`2020`
`USA`	`/ WA-UW-316 /`	`2020`

The proposed convention is as follows: 1) country (USA). 2) The middle sample identification cell should include two-letter state (eg: CA), an abbreviated identifier for the submitting lab (eg: CDPH), as desired, and a unique sequence identifier (eg:01, S01, 454, ...), with all three terms separated by hyphens.

For states with only one submitting laboratory (which should be most), the identifier for the submitting laboratory may be omitted, resulting in a simple, state-level identifier such as USA/UT-573/2020 .

These recommendations are roughly compatible with existing submissions to GISAID and NCBI, but are completely open for debate.

Recommended formatting and criteria for sample metadata

NCBI SARS-CoV-2 Genbank/SRA

The National Center for Biotechnology has established a custom landing page for SARS-CoV-2 sequences and data, and is working to develop streamlined submission processes for Genbank and SRA. For the time being, we suggest basing metadata and submission formatting on GISAID EpiCoV, which tends to be more comprehensive and structured. We will develop specific guidance for NCBI submissions. In the meantime, here are some general resources to help with NCBI data submission and metadata management.

1. NCBI Submission Portal

Individual sequences can be submitted to NCBI using the following web form. Create an NCBI user account, and select "SARS-CoV-2 (through BankIt)".

Genbank Submission Portal

2. NCBI Batch Submissions

NCBI has indicated that they plan to develop a specific rapid submission process for SARS-CoV-2 sequences. In the meantime, I believe you should be able to follow the FDA/CFSAN submission protocol below, which includes links to appropriate interfaces and templates (with obvious changes for pathogen and project information).

3. FDA/CFSAN NCBI Submission and Data Curation Protocols

The FDA Center for Food Safety and Applied Nutrition ( CFSAN ) has released a number of protocols as part of their GenomeTrakr Network that may be useful for NCBI sequence submission and metadata curation. While they are written specifically for laboratories that are conducting routine sequencing of foodborne bacterial pathogens, these protocols provide an overview of sequence submission to the NCBI pathogen portal, metadata and preflight data checks.

NCBI submission protocol for microbial pathogen surveillance
Populating the NCBI pathogen metadata template
NCBI Data Curation - Pending release

GISAID EpiCoV

The GISAID EpiCoV Public Access repository is based on existing submission processes and data structures for large-scale influenza surveillance (GISAID EpiFlu). As such, submitters to EpiCoV will discover that several of the required metadata submission fields may be problematic. Nonetheless, a number of laboratories have been submitting sequences with the following:

METADATA FIELDS (GISAID)	GUIDANCE
`Virus name`	USA/FL-103/2020 (see above)
`Accession ID`
`Type`	betacoronavirus
`Collection date`	YYYY-MM-DD
`Location`	USA / State / County?
`Additional location information`
`Host`	Human
`Additional host information`
`Gender`	(no guidance)
`Patient age`	(no guidance, could be binned)
`Patient status`	(no guidance)
`Specimen source`	(free text)
`Outbreak detail`	omit
`Last vaccinated`	omit
`Treatment`	omit
)
At a minimum, we suggest that samples be submitted with `collection date` `location` `host` information attached. `location` , `host` , `gender` `patient age` are all required fields, and several of them likely constitute personally-identifiable information. While they cannot be left blank for submission, you can submit the record successfully (in both single or batch mode) by entering " unknown ".

Note that for GISAID submissions, users must register for an account, and must successfully submit a single submission before being granted access to the bulk submission template and interface.

A copy of the current bulk submission template is available here .

Linking Sequence Accessions

For data linkage, we are proposing the following template, as a simple, lightweight line list of tab-separated values. If this consensus recommendation for data linkage is acceptable, a preformatted .TSV will be made available. We recognize that not all samples sent for sequencing have a PUID associated.

SEQUENCE_NAME	GISAID_ID	GENBANK_ID	COLLECTION_DATE	PUID/COVID-ID
USA/CA-CDPH-999/2020	EPI_ISL_999999	MT99999999	2020-04-01	99999

In this simple proposed schema, GISAID ID or GENBANK ID and COLLECTION DATE are required fields, and our hope is to maximize PUID completion. All accession numbers, including PUID should be entered without any superfluous text or annotation.

Other Useful References and Resources

Slides and Presentations

StaPH-B SARS-CoV-2 Sequencing Seminar (Kevin Libuit, Virginia DCLS - 20200320) (Recording)

Visualization and Phylogenetics

Diagnostic Resources

SARS-CoV-2 PCR Primer List (Grubaugh Lab, Yale SPH)

Notices and Disclaimers

This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication . All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.

License

Unless otherwise specified, the repository utilizes code licensed under the terms of the Apache Software License and therefore is licensed under ASL v2 or later.

This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.

This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.

You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html

Any source code forked from other open source projects will inherit its license.

Privacy

This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Surveillance Platform Disclaimer and Code of Conduct . For more information about CDC's privacy policy, please visit http://www.cdc.gov/privacy.html .

Contributing

Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial .) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.

All comments, messages, pull requests, and other submissions received through CDC including this GitHub page are subject to the Presidential Records Act and may be archived. Learn more at http://www.cdc.gov/other/privacy.html .

Records

This repository is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site .

Updated: 20200321 @dmaccannell

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

SARS-CoV-2 Sequencing Resources

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

奥美的数字营销观点

[美] 肯特·沃泰姆、[美] 伊恩·芬威克 / 台湾奥美互动营销公司 / 中信出版社 / 2009-6 / 45.00元

目前，媒体的数字化给营销人带来了重大影响。新媒体世界具有多重特性，它赋予企业大量机会，同时也带来挑战。营销人有了数量空前的方式来与消费者互动。然而，许多人面对变革的速度感到压力巨大，而且不知道该如何完全发挥这些新选择所带来的优势。本书为读者提供了如何运用主要数字媒体渠道的方法；随附了领先的营销人如何在工作中有效运用这些渠道的最佳案例；提供了数字营销的十二个基本原则；协助数字营销人了解什么是......一起来看看《奥美的数字营销观点》这本书的介绍吧!

码农工具

SARS-CoV-2 Sequencing Resources