Snip
Hacking on 23&Me and DbSNP
What is is in DbSNP that we care about?
-
Single-nucleotide Polymorphisms
A change (deletion, insertion or replacement) in a single nucleotide base pair, which is found in at least 1% of the population.
-
GWAS Allele Frequencies
Genome-wide Association Studies give the percentage of the studied population who have the SNP, i.e. how rare it is.
-
Clinical Significance of SNP
How likely the SNP is to cause a disease, based off of a universal standard.
What does 23&Me provide us with?
Our Genotype information at various 'ref SNPs', specific locations on the genome that someone has deemed significant.
rsid chromosome position genotype # Typical 'rsid' ID that starts with 'rs' rs12564807 1 734462 AA rs3131972 1 752721 AG rs148828841 1 760998 CC rs12124819 1 776546 AA rs115093905 1 787173 -- rs11240777 1 798959 GG rs7538305 1 824398 -- rs4970383 1 838555 CC rs4475691 1 846808 CT rs7537756 1 854250 AA rs13302982 1 861808 AG rs55678698 1 864490 CC # 23&Me's internal, experiemental ref_snps, starts with 'i' i6019299 1 871267 CC
Clinical Significance of Alleles
We probably only care about:
-
Pathogenic
-
Likely Pathogenic
-
Drug Response
-
Protective
-
Risk Factor
snip=# SELECT Count(*) cnt, clinical_significance_csv FROM ref_snp_allele_clin_diseases GROUP BY clinical_significance_csv HAVING COUNT(*) > 25; cnt | clinical_significance_csv --------+----------------------------------------------------------- 119 | affects 181 | association 53646 | benign 14252 | conflicting-interpretations-of-pathogenicity 543 | drug-response 101029 | likely-benign 24631 | likely-pathogenic 13958 | not-provided 293 | not-provided,conflicting-interpretations-of-pathogenicity 2173 | other 63192 | pathogenic 66 | protective 824 | risk-factor 186296 | uncertain-significance (14 rows)
What's Next??
-
OpenSpace tomorrow (Sat.) at 4:00PM, Room 19
Come checkout the actual implementation! Walk away with a JSON file with your results. Looking for contributors..
-
JSON API Service
Provide a JSON API service to provide research and information that pertain to your individual genome.
-
Single-page App Interface
We need a front-end to interface with this data.
Thanks!
Slides available at: http://thelaziestprogrammer.com/talks/snip-pycon-2018.html