Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great post and thanks for sharing your learnings.

A couple of quick questions:

Was the 25TB raw data gathered from a single human genome?

What would be the size in bytes of a unique genomic fingerprint once raw data is all fully processed into high confidence base values? (including non-coding regions)

If we just look at coding regions and further compress by only looking at SNPs, how many bytes is that?

Considering that each base has ~2B of information... it would be super interesting to know how much space it takes to describe our uniqueness!



Q: "Was the 25TB raw data gathered from a single human genome?"

Essay: "Each row contained a data for a single SNP for a single person." ... "There were ~2.5 million SNPS and ~60 thousand people"

Statement: "it would be super interesting to know how much space it takes to describe our uniqueness"

Genome size is "3,234.83 Mb (Mega-basepairs) per haploid genome" says https://en.wikipedia.org/wiki/Human_genome .

However, question of uniqueness depends on your model. We are all unique.

If you have human genomes, then use a reference template and only list those which differ. Eg, https://en.wikipedia.org/wiki/Compression_of_Genomic_Sequenc... .


In order to uniquely identify, not a lot of space! A recent paper puts it at 50 SNPs (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5785835/).

Describing your total unique genetic profile would obviously require a lot more space, and wouldn't be constant across individuals/ancestral backgrounds (e.g. there's more genetic diversity in people of African descent).


Sorry I saw this article and thought it was pretty interesting. This is NOT my article but I'd like to know what others would do under this situation.

BTW not sure, but is it OK to post other's article here? Maybe I should add a short commentary in the title.


The default assumption is that it's not your article, unless you prepend "Show HN" (or there's something obvious like your username matching the domain name).


Of course! Glad you posted it. I didn't realize that many people would find it interesting.


I think it's fine. The author might show up otherwise others with interest in the area might reply.


You don't even need to say it's not yours, probably most that is posted here isn't from the author.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: