This bucket contains the following genomics datasets:

1. Google brain dataset- at prefix /google-brain - This dataset is a copy of the raw FASTQ sequence files that are publicly available in the Google Cloud Storage at gs://brain-genomics-public/research/sequencing/fastq. The data includes HiSeqX, NovaSeq, and PacBio HiFi sequencing from Novogene (https://en.novogene.com/). All WGS and exome runs were conducted with 151-bp paired-end reads. 

  The original source of truth of this dataset can be found in gs://brain-genomics-public/research/sequencing/ which also contains BAMs and VCFs as described in Baid et al 2020.

  All data are released under a CC-0 license.

  Gunjan Baid, Maria Nattestad, Alexey Kolesnikov, Sidharth Goel, Howard Yang, Pi-Chuan Chang, Andrew Carroll - An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development. doi: https://doi.org/10.1101/2020.12.11.422022