Who is an excel whiz?

BTMO

Overlord of the Underthings
Joined
May 1, 2009
Messages
2,521
I have a string of text - it is a DNA sequence - and I want to import it into excel to do some calculations on nucleotide frequencies.

What I would like to do is (essentially) import a 16,000 character word into excel, but I would like each character in a separate cell of a column or row.

I can get it into excel, and know a fairly manual way of getting each character into a separate cell, but given the length of the string involved, would like a smart, easy way to do it.

Help...

:D
 
What seperates the characters in the text string, is there a space, comma or anything?
 
If there is no seperator, then import the entire string into excel and record a macro for one manual operation.Then replay the macro the required number of times.
 
I`ve dealt with a similar situation in OpenOffice Calc by

1) in a text editor, search&replace-ing replacing each character with that character followed by a semicolon (with 4 bases that should be easy)
2) copying that string
3) special pasting the string into Calc, chosing semicolon as separator

That puts each character into a separate column. To get into separate rows, copy that and paste with transposition.

Of course, 16000 columns might be stretching it ... not sure Excel can do that!
 
What I´d probably do with that huge a string is write a tiny QBasic program to turn it into 16000 individual lines of 1 character, then import that. Sorry, not very helpful, I know :)

(Although I'd be happy to do the transformation for you and send you back the file - should only take 5mins. Let me know.)
 
Ok - got it.

My partner, bless her little socks, is also a data analyst from way back, and after we had dinner, we spent two minutes talking about it, and trying to figure out the smartest way to do it. Asking the question is often the way to solve these things...

What we did was copied the text from Mega4 (the genetics programme my sequence was in) into Word, then used... copy and replace.

We copied "A" and replaced it with "A^p" - ie, we replaced it with an A, and a paragraph character. We then repeated this for T, G and C.

We then just imported it straight into excel, I did my calcs and hey presto! I am done. Overall, it took me about 30 seconds to do the transformation, and since it is in 16,000-odd rows, there were no issues importing it into excel.

Oh, and Skwinty - sorry - I should have mentioned that there were no separators.

The string of characters was the mitochondrial sequence of a fish - Oplegnathus fasciatus - that I wanted to perform some comparisons with another species.

Just for fun, I will paste the entire sequence below. Welcome to my world...

;)

Thanks for your help folks - I seriously appreciate it!

Cheers,

Brenton

--sequence follows--

#O. fasciatus tRNA-Phe+s-rRNA+tRNA-Val...
Oplegnathus fasciatus mitochondrion, complete genome.
190349398
1
GCTAGCGTAGCTTACTTAAAGCATAACACTGAAGATGTTAAGACGAGCCCTAGAAAGCTCCGCAAGCACAAAGGCTTGGTCCTGACTTTACTATCAGCTCTAGCCAAACTTACACATGCAAGTATCCGCACCCCCGTGAGAATGCCCCAACAGTTCCCTGCCCGGGAACAAGGAGCTGGTATCAGGCACACCCCATTTAAGCCCACGACACCTTGCTTAGCCACACCCTCAAGGGAATTCAGCAGTGATAGACATTAAGCTATAAGTGAAAACTTGACTTAGTCAAGGCAAAGAGGGCCGGTAAAACTCGTGCCAGCCACCGCGGTTATACGAGAGGCCCAAGTTGATAGACTCCGGCGTAAAGCGTGGTTAAGACAAATTTTAAACTAAAGCCGAACGCCCTCAGAGCTGTTATACGCTCCCGAGGGTAAGAAGCCCAATCACGAAAGTGGCTTTATACCAACTGAACCCACGAAAGCTATGACACAAACTGGGATTAGATACCCCACTGTGCTTAGCCATAAACATTGATAGTATACTACAACTACCATCCGCCCGGGAACTACGAGCACTAGCTTAAAACCCAAAGGACTTGGCGGTGCTTTAGATCCACCTAGAGGAGCCTGTTCTAGAACCGATAACCCCCGTTCAACCTCACCCTTCCTTGTCCTCCCCGCCTATATACCGCCGTCGTCAGCTTACCCTGTGAAGGCCTAATAGTAAGCAAAATTGGCATGGCCCAAAACGTCAGGTCGACGTGTAGCGTATGGAGGGGGAAGAAATGGGCTACATTCACTAAGACAGTGAACACGAACGATGTACTGAAACGTATGTCCGAAGGAGGATTTAGCAGTAAGCAGAAAATAGAGCGTTCTGCTGAAATTGGCCCTGAAGCGCGCACACACCGCCCGTCACTCTCCCCGAGCATAGCAGTTACCTGTAACTAAAACCTCAAGATATTGTAAAGGGGAGGCAAGTCGTAACATGGTAAGTGTACCGGAAGGTGCACTTGGAAAAATCAAAGTGTAGCTAAGCCAGCAAAGCGTCTCCCTTACACTGAGAAGTCATCCGTGCAATTCGGATCACCTTGATGCCCACCAGCTAGCCCCCAGACCAAAAACAACCCATCAATATTTATACCCCCAAACACACGAATACTTACGTTAAACAAACCATTTTTCCCCCCTAGTATGGGCGACAGAAAAGGGACTACGATGGAGCGATAGAGAAAGTACCGCAAGGGAACGCTGAAAGAGCAAATGAAATAACCCAGTGAAGCCTTAAAAAGCAGAGATTCAACCTCGTACCTTTTGCATCATGATTTAGTTAGTAATAATCAAGCAAAGAGTTCTTTAGTTTGACCCCCCGAAACTAGGTGAGCTACTCCAAGACAGCCTGTTAGTAGGGCAAACCCGTCCCTGTTGCAAAAGGGTGGGAAGAGCTTTGAGTAGAGGTGACAGACCTATCGAACCTAGTTATAGCTGGTTGCCTGAGAATTGGATAGAAGTTCAGCCTCCCGGTTTCTCTAGTCACCTTCGTCCAACCCACTCAGACACCTAAAGAAAACCCGCGAGAGTTAGTCAAAGGGGGAACAGCCCCTTTGAAACAAGACACAACTTTACCAGGAGGGTAAAGATCATAATAAACTAAAGGTAAAATGTTCTGGTGGGCCTAAAAGCAGCCACCCCAACAGAAAGCGTTAAAGCTCGGACATACCACACACCCTCTAATCCTGACAATTAAATCCTAGCCCCCTAATCCTACCAGGCCGCCCCATGCAAACATGGGAGTGACCATGCTAATATGAGTAATAAGAGAGGCCCCGCCTCTCTCCTTGCACACGTGTAAATCGAAATGGACCCGCCACCGAATCTTAACGGCCCCAAACAAAGAGGGAATTGAACAGCAAACCAAACAACCAGAAAACCACCCATACAACGACCGTTAACCCCACACTGGTGTGCACCCAAGGAAAGACTAAAAGAAAAAGAAGGAACTCGGCAAACACAAAGCCTCGCCTGTTTACCAAAAACATCGCCTCTTGCAAAACTGAGGAATAAGAGGTCCCGCCTGCCCTGTGACTATATGTTTAACGGCCGCGGTATTTTGACCGTGCAAAGGTAGCGCAATCACTTGTCTTTTAAATGGAGACCCGTATGAATGGCATAACGAGGGCTTAACTGTCTCCTTTTTCAAGTCAATGAAATTGATCTCCCCGTGCAGAAGCGGGGATAAAACCATAAGACGAGAAGACCCTATGGAGCTTTAGACACCAAGGCAGATCATGTTAAGCACCCCAAATTAAAGGGCCAAACCAGAGGAGCCCTGCCCTAATGTCTTTGGTTGGGGCGACCGTGGGGAAACAAAAAACCCCCATGTGGAATGGGAAAACCCTTTCCTAAAACTAAGAGCCCCCGCTCTAATAAACAGAATTTCTGACCAATCAGATCCGGCAACGCCGATCAACGGACCGAGTTACCCTAGGGATAACAGCGCAATCCTCTTTTAGAGCCCATATCGACAAGAGGGTTTACGACCTCGGATGTGGATCAGGACATCCCAATGGTGCAGCCGCTATTAAGGGTTCGTTTGTTCAACGATTAAAGTCCTACGTGATCTGAGTTCAGACCGGAGTAATCCAGGTCAGTTTCTATCTATGATATGATCTTTTCTAGTACGAAAGGACCGAAAAGAAGAGGCCCATATCCAAGATACGCCTCACCCCTACCTAATGAAAACAACTAAAATAGCCAAAAGGGCATATCCCATCCCCGCCTGAGAAAACGGCATGTTAAGGTGGCAGAGCCCGGTAACTGCAAAAGACCTAAGCCCTTTCCACAGAGGTTCAAGTCCTCTCCCTAACTATGATCTCAACACTCATCACCCATATTATTAACCCCCTAGCCTTCATCGTGCCCGTCTTACTAGCCGTTGCCTTCCTAACCCTACTCGAACGCAAAGTGCTTGGCTATATACAACTACGAAAAGGCCCCAACATTGTTGGCCCTTACGGCCTCCTACAGCCCATCGCCGACGGAGTAAAACTTTTCATTAAAGAACCAGTACGCCCTTCCACTTCCTCTCCCGTCCTATTTTTATTAACCCCCATACTAGCACTAACCCTTGCTCTTACACTCTGAGCCCCAATACCTTTCCCTTACCCTGTAGTTGACTTAAATCTTGGCATTCTATTTATCCTAGCCCTTTCCAGCCTTGCAGTCTATTCGATCCTCGGATCAGGCTGAGCCTCAAATTCAAAATATGCCCTAATCGGGGCCTTACGAGCCGTAGCCCAAACCATTTCATATGAAGTTAGCCTCGGACTCATTCTCTTAAACGCTATTATCTTTACAGGAGGCTTCACATTACAAACCTTTAATATTGCCCAAGAAAGCATCTGACTAATCTTACCAGCTTGACCCCTAGCTGCAATATGATACATCTCAACCTTAGCAGAAACTAACCGTGCCCCCTTCGACTTAACTGAAGGCGAATCCGAACTAGTTTCAGGCTTCAACGTAGAATACGCAGGAGGCCCTTTCGCCCTATTCTTTCTAGCAGAATACGCCAATATTTTACTTATAAACACACTCTCCGCAACACTATTTCTAGGAGCCTCCCACATCCCCACCTTTCCAGAACTTACCGCTATAAACCTTATAATCAAAGCAGCCCTCCTCTCAGTTGTCTTCCTATGAGTTCGAGCCTCCTACCCTCGATTCCGATATGACCAACTCATACACCTTATTTGAAAAAACTTTCTTCCACTGACACTGGCCCTAGTCATCTGACACCTGGCCCTCCCCATTGCATTCGCAGGGCTACCCCCACACCTATAAACCAGGAGCTGTGCCTGAAGTAAAGGGCCACTTTGATAGAGTGAATTATGGGGGTTAAAGTCCCCCCAACTCCTTAGAAAGAAGGGGTTCGAACCCTACCCGGAGAGATCAAAACTCTCAGTGCTTCCACTACACCACTTCCTAGTAAAGTCAGCTAACTAAGCTCTTGGGCCCATACCCCAAACATGTAGGTTAAATTCCTTCCTTTACTAATGAACCCCTACCTCTTAGCCACTTTGCTATTTGGACTCGGCCTAGGAACCACAATTACATTCACAAGCTCCCACTGACTCCTTGCTTGAATAGGACTTGAGATTAATACCCTCGCCATCATTCCCCTTATGGTACAACACCACCACCCCCGAGCTGTTGAAGCAACCACTAAATATTTTTTAGTCCAAGCCACCGCAGCAGCTATATTACTATTTGCAAGCATAACCAACGCCTGACTCACAGGACAATGAGACATCCAACAAATATCACACCCTCTCCCTATCACAATAATTACCCTTGCTCTTGCACTAAAAACAGGCCTCGCCCCAGTCCATGCTTGACTGCCAGAAGTCCTTCAAGGACTCGATCTTACTACCGGCCTCCTTCTGTCTACCTGACAAAAACTGGCTCCCTTTGCCCTACTCCTGCAAATTCAACCCACCAACTCAACACTCCTTATTTTATTAGGCTTATCATCCACCCTTGTAGGAGGATGAGGAGGCTTAAACCAAACCCAACTGCGTAAAATCCTCGCCTATTCCTCAATTGCACACCTCGGCTGAATGATCCTAATTCTGCAATTTTCTCCCTCTCTCACTCTCTTAACCCTCCTAATATACTTTATAATGACAAGTTCAACCTTCCTCACATTTAAACTAAATAACTCAACCAACATCAACACACTTGCCACTTCGTGGGCAAAAGCCCCCGTAATCACCACCCTCGCCCCCCTCATCCTCCTCTCCCTAGGAGGACTGCCCCCACTTACCGGCTTCGTACCCAAATGACTAATCCTCCAAGAACTAACAAAACAAGACCTCGCACCCACTGCCACCCTAGTGGCCCTAACCGCCCTACTAGGTTTATACTTCTACCTCCGACTCTCGTACGCAATGACACTTACTATGTCCCCCAACAACCTAGCCGGAACCACCCCATGACGCCTCCGCCCCTCGCAATTCACAATACCTATTGCCCTTTCAACCATAACCACCCTATCACTACTACCCCTAACCCCAGCCATAATTGCACTGCTCACCCAGTAAGAGACTTAGGATAGCACAAGACCAAGAGCCTTCAAAGCCCTAAGCGGGAGTGAAAATCTCCCAGTCCCTGATAAGACCTGCGGGACACTACCCCACATCTCCTGCATGCAAAACAGACACTTTAATTAAGATAAGGCCTTCCTAGACAGGCAGGCCTCGATCCTACAAACTCTTAGTTAACAGCTAAGCGCTCAAACCAGCGAGCATCCGTCTAAACCCTTTCCCCCGCCTATCCAGGCGGGCCAAGGCGGGGGAAAGCCCCGGCAGGAACTACCCTGCTACTTCAGATTTGCAATCTAATATGTTAACACCTCAGAGCTTGGTAAGAAGAGGACTCAAACCTCTGTCTATGGGGCTACAATCCACCGCTTAAAACTCAGCCATCCTACCTGTGGCAATCACACGTTGATTTTACTCGACTAATCACAAAGACATCGGCACCCTCTATCTAGTATTTGGTGCCCGAGCCGGCATAGTAGGCACGGCCCTAAGCTTACTCATCCGAGCAGAACTAAGCCAACCAGGCGCTTTCCTCGGAGACGACCAGATCTATAATGTAATTGTTACAGCACATGCCTTCGTAATAATCTTCTTTATAGTAATGCCAATTATGATTGGAGGTTTTGGAAACTGACTCATCCCCCTCATGATTGGTGCGCCAGACATGGCATTTCCTCGAATAAATAACATGAGCTTTTGACTGCTCCCACCCTCTTTCTTGCTACTGCTGGCCTCTTCCGGAGTAGAAGCTGGAGCAGGCACCGGATGAACCGTTTATCCGCCTCTCGCAGGTAATTTAGCCCATGCAGGAGCGTCTGTTGATTTAACAATCTTCTCTCTACACTTAGCAGGTATTTCCTCAATCCTCGGGGCAATCAACTTTATTACAACTATTATTAACATGAAACCCCCTGCCATTTCCCAATATCAAACCCCACTATTTGTGTGAGCAGTCCTAATTACTGCTGTTCTACTTCTCCTTTCCCTCCCCGTTCTCGCTGCTGGCATCACCATGCTCCTAACAGACCGAAACCTAAATACCACCTTTTTTGACCCTGCAGGAGGAGGAGACCCCATCCTTTACCAACACCTCTTCTGATTTTTCGGACATCCGGAAGTCTATATCCTGATTCTTCCAGGGTTTGGTATAATTTCACACATCGTTGCTTACTACTCTGGTAAAAAAGAACCCTTCGGCTATATAGGCATGGTCTGAGCTATGATGGCAATTGGCCTTCTAGGATTTATTGTTTGAGCCCACCATATGTTCACAGTTGGAATGGATGTTGACACACGCGCCTACTTTACATCCGCCACTATAATTATTGCAATCCCCACAGGTGTAAAAGTCTTTAGCTGACTTGCAACCCTTCATGGAGGAGCAATCAAATGAGAAACCCCTCTTCTCTGAGCCCTCGGCTTCATTTTCCTTTTCACAGTAGGGGGTCTAACGGGCATTGTCCTAGCCAACTCCTCTCTAGATATCGTCCTACATGACACATGCTACGTAGTAGCACATTTCCATTACGTTCTCTCGATAGGAGCCGTATTTGCCATCGTTGCAGCCTTTGTACATTGATTCCCCCTATTTACAGGCTACACCCTCCACAGCACTTGAACAAAAATCCACTTCGGAATTATGTTTATTGGGGTAAACCTTACCTTTTTCCCTCAACACTTCCTAGGGCTAGCCGGAATGCCCCGACGGTATTCAGACTACCCAGATGCCTACACACTATGAAATACAGTTTCCTCTATTGGGTCACTAATTTCGCTGGTAGCAGTAATTATGTTCTTATTATTTATTTGAGAAGCATTTGCTGCCAAACGTGAAGTTCTGTCCGTAGAACTAACCACAACTAATGTGGAATGACTGCATGGCTGCCCTTCCCCTTACCATACATTCGAGGAACCTGCATTCGTCCAAATTCAATCAAACTAACGAGAAAGGGAGGAGTCGAACCCCCATGTATTGGTTTCAAGCCAATCACATAACCGCTCTGTCACTTTCTTCATAAGACACTAGTAAAATAAGTTATTACACTGCTTTGTCAAGGCAGAATAGTGGGTTAAACCCCCGCGTGTCTTGCAATTAATGGCACATCCCTCACAACTAGGCTTTCAAGATGCAGCTTCACCTGTTATAGAAGAACTTCTTCATTTCCACGACCACGCCTTAATAATCGTCTTCTTAATTAGTACACTAGTACTTTACATTATTGTGGCTATAGTTACCACTAAACTTACCAATAAATTTATTCTAGACTCCCAAGAAATCGAAATTATCTGAACTGTCCTTCCTGCAGTTATCTTAATCCTAATTGCCCTCCCATCCCTCCGCATCCTATACCTTATAGATGAAATCAATGACCCCCATCTCACAGTTAAAGCCATAGGCCACCAATGATACTGAAGCTACGAATACACAGACTACGAAGCCCTTGGCTTCGACTCCTATATGATCCCTACACAAGACTTAGCCCCTGGTCAATTTCGCCTTCTAGAGGCAGACCACCGCATGGTAGTTCCAGTGGAATCCCCCATTCGTATACTAGTCTCCGCCGAAGACGTACTACACTCTTGAGCAGTCCCCGCCCTAGGTGTAAAAATAGACGCAATTCCCGGCCGCCTAAATCAAACAGCCTTCATCTCCTCCCGTCCAGGTGTTTACTACGGACAATGTTCTGAAATCTGCGGAGCAAACCACAGCTTTATGCCCATCGTAGTTGAAGCAGTTCCTCTAGGATACTTTGAAAGCTGATCATCCCTAATACTTGAAGACGCCTCGCTAAGAAGCTAAACAGGGTATAGCGTTAGCCTTTTAAGCTAAAGACTGGTGACTCCCAACCACCCCTAGCGATCATTATGCCTCAGCTCAACCCCGCTCCCTGATTTACTATTCTTATCTTCTCCTGAATAATCTTTTTAATTGTTATCCCACCAAAAGTTATAGCCCACACTTTCCCATACGAACCCACACCTCAAAGCGCGGAAAAACCCAAAACAGACTCTTGAAACTGACCATGACACTAAGCTTTTTTGACCAGTTTATGAGCCCCGTTTTCCTGGGCATTCCCCTGATTGCCCTTGCATTAACCCTTCCGTGAATCCTTTTCCCCACCCCCACATCCCGATGGTTAAATAACCGCTTACTCGCCCTACAAAGCTGATTCATTAACCGGTTTACACAACAACTTCTTCTACCCCTGAGCTTAGGAGGCCATAAATGAGCCATCTTATTGACCTCTTTAATACTATTTCTTATTTCCCTTAACATGCTTGGCCTTCTCCCATATACCTTCACCCCCACAACACAGCTATCCCTCAACATAGGCCTTGCAGTACCACTCTGACTGGCAACAGTACTCATCGGCCTCCGAAACCAGCCAACTATTGCCCTCGGACACCTCCTTCCAGAAGGTACTCCCACCCCACTAATCCCCGTGCTAATTATTATCGAAACAATTAGCCTTTTCATCCGTCCTCTTGCGTTAGGCGTACGACTAACAGCTAACCTCACAGCCGGCCACCTCCTTATTCAACTAATTGCCACCGCTGCCTTCGTCCTCATACCTTTAATACCAACCGTTGCAATCCTCACAGGAACACTACTATTCCTTCTAACCCTGTTAGAAGTGGCCGTAGCCATAATTCAAGCCTACGTATTTGTGCTCCTCCTCAGCCTATACCTCCAAGAAAACGTCTAATGGCCCACCAAGCACACGCATACCACATAGTTGACCCCAGCCCCTGACCCCTAACAGGTGCAGTTGCAGCACTGCTAATAACATCAGGCCTTGCAATCTGATTTCACTTCCACTCCACCACCCTTATAGTCCTCGGAACAGCATTACTCCTTCTTACAATGTACCAATGATGACGAGATATCGTACGAGAAGGCACTTTCCAAGGACACCATACACCCCCTGTCCAAAAAGGCCTTCGATATGGCATAATCCTCTTTATCACATCAGAAGTCTTTTTCTTTCTAGGCTTTTTCTGAGCCTTCTACCACTCAAGCCTGGCACCTACTCCAGAACTAGGAGGTTGCTGACCCCCTGCTGGTATTACCACTCTAGACCCATTCGAAGTCCCCCTACTCAACACCGCCGTCCTACTTGCCTCAGGAGTTACAGTCACCTGAGCCCACCATAGTATTATAGAGGGTGAACGTAAACAAGCAATCCAATCCTTGACACTCACAATTCTCCTTGGCTTCTACTTCACTTTCCTCCAAGCCATAGAATATTATGAAGCCCCCTTTACAATCGCCGATGGAGTCTACGGCTCTACTTTCTTCGTAGCAACCGGATTCCACGGACTACATGTTATTATTGGCTCCACATTCCTAGCCATCTGCCTACTCCGACAAGTCCAATACCACTTTACATCTGAACATCACTTTGGATTCGAAGCAGCTGCATGATATTGACACTTCGTAGACGTCGTCTGATTATTCTTATACATCTCCATCTACTGATGAGGCTCATAATCTTTCTAGTAACAAAATAAGTATAAGTGACTTCCAATCACCCGGTCTTGGTTAAAATCCAAGGAAAGATAATGAACCTAGTTACAACAATCATTACCATTGCCATCGCACTCTCCACTATCCTGGCAATTGTCTCATTTTGACTGCCCCAAATAAGCCCAGACCACGAAAAGCTCTCTCCATACGAATGCGGCTTCGACCCCCTAGGTACAGCCCGACTGCCCTTCTCCCTCCGCTTTTTCCTCGTCGCCATTCTTTTCCTCCTCTTTGACCTAGAAATTGCCCTTCTTTTACCCCTTCCATGAGGAGACCAACTTTCATCCCCACTTACTACATTCCTCTGAGCCACAACAGTCTTAGCCCTCCTTACACTAGGCCTAATCTACGAATGACTCCAAGGCGGACTAGAGTGAGCCGAATAGGCAGTTAGTCTAAGAAAAACATTTGATTTCGGCTCAAAAACTTGTGGTTAAAGTCCATAATTGCCTGATGACCCCCGTCCACTTTGCCTTCTCATCAACCTTTCTACTAGGGTTGACAGGCCTCGCATTCCATCGAACCCATCTTCTCTCTGCCCTCTTATGCTTAGAAGGCATAATACTTTCTTTGTTTATTGCACTGTCCCTCTGAACCCTCCAACTTGATTCCACCAGCTTCTCAACCTCCCCTATACTCTTACTCGCATTTTCAGCTTGCGAAGCAAGTGCAGGACTTGCACTGCTAGTAGCCACCGCCCGAACCCACGGCTCCGACCGTCTCCAAAACCTAAACCTCCTACAATGCTAAAAGTTCTTATCCCAACCCTCATGCTCATTCCAACCACATGACTGGCACCCTCCAAATGACTGTGACCTACAACCCTGCTTCACAGCCTATTAGTGGCCCTTATTAGCCTTACCTGACTGAAAAACCTTTCAGAAACCGGCTGATCCTGCCTCAACCTTTATATAGCTACAGACCCACTTTCAACCCCCCTCCTAGTTCTTACTTGCTGACTACTACCTCTAATGATTCTTGCAAGCCAAAACCACACAGCCCTTGAACCCATTAATCGTCAACGAATATATATTACACTTCTAACATCCCTACAACTCTTCCTAATCATAGCCTTTAGTGCCACCGAAATCATCATGTTCTACATCATGTTTGAAACTACCCTCATTCCCACCCTAATCCTCATCACCCGCTGAGGAAACCAGACAGAACGCCTCAACGCCGGCACTTACTTTCTTTTCTACACACTAGCAGGGTCCCTGCCCCTCCTAGTCGCCCTCCTACTCCTTCAAAATACCACAGGAACCCTCTCCTTACTAACCATTCAATATTCCAACCCCCTCCTGCTTACAACCTATGCAGACAAATTATGATGAGCTGGCTGCCTATTAGCCTTTCTAGTAAAAATACCCCTATATGGTGCTCACCTTTGACTTCCCAAAGCCCATGTTGAGGCCCCAGTGGCAGGCTCTATAATTCTTGCCGCAGTCCTCCTGAAACTAGGAGGATACGGTATAATACGAATAATAACCATACTAGAGCCCCTTACCAAAGAGCTAAGCTACCCCTTTATCGTTTTTGCCCTCTGAGGCGTAATCATAACAGGCTCAATCTGCCTACGCCAAACAGACCTAAAATCCCTAATTGCATACTCCTCAGTAAGCCACATGGGTCTCGTAGTAGGAGGAATCCTTATTCAAACACCATGAGGATTCACTGGAGCCCTAATCCTTATAATCGCACATGGCCTAACCTCCTCCGCCCTATTCTGCCTAGCAAACACTAATTATGAACGAACACATAGCCGAACACTAGTCCTAGCCCGAGGCCTACAAATAGCCCTTCCCCTAATAACAACATGATGATTCATTGCCAGTCTCGCTAACCTGGCCCTACCCCCACTCCCCAACCTGATAGGAGAACTAATAATTATCTCCTCCCTATTCAACTGATCCTGATGAACTCTACTATTGACAGGTGCCGGCACCCTTATTACCGCAGGCTACTCTCTCTACATATTCCTAATAACCCAACGAGGACCGCTCCCAGCACATATTATTGCACTAGACCCCTCACACTCTCGAGAGCACCTCCTCATAGCCCTCCACCTCCTCCCCTTACTACTACTAATTCTCAAACCCGAATTAATTTGAGGCTGAACAGCTTGTAGATATAGTTTAACAAAAACATTAGATTGTGATTCTAAAAATAAGGGTTAAACTCCCCTTATCCACCGAGAGAGGCTCGCTAGCAACGAAAACTGCTAATTTTCGCCACCTTGGTTGAACCCCAGGGCTCACTCGAGCCGCTTCTAAAGGATAACAGCTCATCCGTTGGTCTTAGGAACCAAAAACTCTTGGTGCAAATCCAAGTAGCAGCTATGCAACCTACGTCACTTATAATAACCTCGAGCTTAATCATTATTTTCACGCTCCTCGCATACCCCGTTCTCACAACGCTCACGCCCCAACCCCAACCATCAAACTGAGCATTGATCCAAGTAAAGACCGCAGTAAAGCTAGCTTTCTTTGTTAGCCTCCTTCCACTATTCCTATTTCTAAACGAAGGAGCAGAAATAATTATTACCAGCTGAAACTGAATAAATACCCTAACCTTTGACGTGAATATCAGCTTCAAATTCGACCACTACTCTATTATTTTCACCCCTATTGCCCTCTACGTGACTTGATCCATTCTAGAATTTGCATCCTGATACATGCATGCAGACCCTTACATAAACCGCTTTTTTAAATACCTTCTTGTCTTCCTCATTGCTATAATTATTCTAGTAACAGCAAACAACATATTTCAACTTTTTATTGGATGGGAAGGCGTAGGCATTATATCCTTCCTCCTAATTGGCTGATGATATGGCCGAGCAGATGCAAATACCGCTGCTCTCCAGGCAGTAATTTATAACCGGGTTGGAGATATCGGCTTAATCTTTGCCATAGCATGAATAGCAATAAACCTTAACTCATGAGAAATACAACAGATGTTCGCAGCCGCAAAAAACTTTGACCTTACCTTCCCGCTTCTAGGGCTAATTATTGCTGCCACTGGTAAATCAGCCCAATTCGGCCTACACCCCTGACTTCCCTCTGCCATGGAGGGCCCTACACCGGTCTCTGCCCTACTGCATTCAAGCACCATGGTTGTTGCAGGCATTTTCCTCCTAGTTCGTATGAGCCCCCTTTTAGAAAATAACCAAACAGCCCTCACCATCTGCCTATGCCTTGGCGCCCTCACAACCCTCTTCACCGCCACTTGTGCCCTCACCCAAAACGACATCAAAAAAATTGTTGCATTCTCAACATCCAGCCAACTAGGCCTTATAATAGTAACCATCGGACTAAACCAACCTCAACTTGCCTTCCTACACATTTGCACCCATGCTTTCTTCAAAGCAATGCTTTTCCTCTGCTCTGGCTCTATTATTCACAGCCTAAATGATGAACAAGACATTCGAAAAATAGGAGGAATACATCACCTTACCCCCTTCACATCTTCCTGCCTCACTGTTGGCAGCTTAGCCCTTACAGGCACCCCCTTCCTAGCAGGCTTTTTCTCTAAAGATGCCATTATTGAAGCACTAAACACATCCCACCTAAACGCCTGAGCCCTTGCCCTAACCCTCTTAGCCACCTCCTTCACAGCCATCTACAGCCTACGAGTTGTTTTCTTTGTGTCAATAGGCCACCCCCGATTTAACCCTCTTTCCCCTATCAACGAAAACAACCCAGCCGTAATTAACCCAATCAAACGACTGGCCTGAGGCAGTATTGTCGCCGGCCTATTAATTACCTCTAACATCACCCTATCAAAAACACCCGTGATATCTATACCTCCTCTATTAAAACTAGCTGCCCTAACTGTAACAATTTTAGGACTTCTCATTGCCCTCGAACTAGCATCACTAACAAGCAAACAATACAAAGCCACCCCACAACTTGCCACCCATCACTTCTCCAATATACTAGGCTTTTTCCCAGCAATTATCCACCGCCTCACCCCTAAGCTAAACTTGACCTTAGGCCAAGCAATTGCCAGCCAAATGATTGACCAAACCTGACTAGAAAAAACAGGTCCCAAAGCAATTGCTTCTTCCAACCTGCCTCTAATTACAACAACAAGCAGTGCTCAACAAGGCATAATCAAAACATACCTCGCCTTATTCCTCATCACACTGGCCCTCTCAACTCTAATATTTATTTACTAAACTGCCCGAAGTGTTCCCCGACTTAGACCTCGCGTCAATTCTAATACCACAAATAAAGTCAACAAGAGCACCCATGCACCAATGACCAACATCCCCCCTCCAAACGAATATATCCACGCAACACCCCCCGCATCTCCTCGAAACACAGAAAACTCACCAAGCTCATCTACCGGTACCCAAGAAGCCTCATATCATCCACTTCAAAATAGACCAGATACCAAAACTACAACTACCACATATACAACCATGTACGTTGCAACTGACCGACTCCCTCAACTCTCAGGATAAGGCTCAGCAGCAAGAGCTGCCGAATAAGCAAATACAACCAGCATTCCCCCCAAATAAATCAAAAACAACACTAAAGATAGAAAAGGGCCCCCATGTCCCACTAATACTCCACACCCCATGCCCGCCACCACCACTAACCCTAAGGCAGCAAAGTAAGGAGACGGGTTAGAAGCAACTGCTACTAGCCCTAAAATTAGACCAAATAAAAATAAAAATATAACAAAAGTCATAATTTCTGCCAGGATTTTAACCAGGACCAATGGCTTGAAAAACCACCGTTGTTATTTAACTACAGAAACCTTAATGACAAGCCTACGAAAAACCCACCCCCTCCTAAAAATCGCAAACGACGCACTCGTCGATCTCCCTGCTCCCTCTAACATTTCGGTCTGATGAAACTTCGGCTCCCTCCTCGGTCTCTGCTTGATTACCCAAATCCTTACAGGATTATTTCTAGCCATGCACTACACCTCAGACATCGCAACAGCCTTCTCATCCGTAGCACACATTTGCCGAGATGTAAATTACGGATGACTTATCCGAAATATTCATGCCAACGGTGCCTCCTTCTTCTTCATCTGCATCTACATGCATATCGCCCGAGGCCTCTATTACGGCTCATACCTTTACAAAGAAACCTGAAATGTAGGTGTAATTCTCCTCCTTTTAGTAATGATGACTGCCTTCGTAGGATACGTCCTCCCCTGAGGACAAATATCTTTTTGAGGGGCAACCGTCATCACCAACCTTCTTTCCGCCGTCCCCTATGTAGGTAACACCCTCGTCCAGTGAATCTGAGGGGGCTTCTCAGTAGACAACGCCACCCTCACCCGATTCTTTGCCTTCCACTTCCTCTTCCCCTTTGTTATTGCAGCCGCAACCTTCATCCATCTAATTTTCCTCCACGAAACCGGATCTAACAACCCCCTGGGCCTCAACTCAGATGCCGACAAAATCTCATTCCACCCATACTTCTCTTACAAAGACCTACTAGGATTTACAGCACTTCTAATTGCCCTCGCCTCATTAGCACTGTTTTCCCCTAACCTCCTAGGAGACCCCGACAACTTCACACCTGCCAACCCCTTAGTAACTCCCCCACACATCAAGCCTGAATGATATTTCTTATTTGCATATGCCATTCTACGCTCAATTCCCAACAAACTCGGAGGGGTCCTAGCCCTACTATCCTCCATTTTGGTACTGATAGTTGTCCCAATCCTCCACACATCAAAACAACGAAGCCTAACGTTCCGACCTCTAACACAATTCCTATTCTGAACCCTCGTTGCAGATGTTATTATCCTTACCTGAATTGGAGGCATACCTGTTGAACAACCCTTCACTATCATTGGACAAGTTGCATCTTTCTTGTATTTTTTCCTCTTCTTAGTCCTTGCCCCCCTGGCAGGCTGAATAGAAAATAAAGTCCTTCAATGACACTGCACTAGTAGCTCAGTTTCAGAGCGCCGGTCTTGTAAACCGGACGCCGGAGGTTAAAATCCTCCCTACTGCTCAAAGAAAGGAGATTCTAACTCCTACCCCTGACTCCCAAAGCTAGGATTCTAAACTAAACTACTCTTTGGACATATATGTATTTACCCCATATATCTATATCGAACATATATATTAATAGCATTCAAGTACATTAATGTATTATCAACATTAACAAATATTAACCATTTCATACACCAGCATGATAACTCATGAGTATACATAAACCACGATATTAATAATAACCATCCGAATGAAACATTGAGGCAGGCGAAATTTATCGACCTATAACGAAATAACTCATTAGTCTAGATATACCAAGATTCATAGATACTGTTATTCTTAATAGATCTTAATGCAATAAGAACCGACCATCAGTTGATTTCTTAATGCATACTCTTATTGATAGTGAGGGACAAAAATTGTAAATTAGCACTTGATGAACTATTCCTGGCATTTGGTTCCTATTTCAGGTCCATTAATTGATATTACTCCTCATACTTTCATTGACGCTGGCATAAGTTAATGGTGGCGAACATTCGACTCATTAACCCACCATGCGCAACGCTCACTTCCAGAGGGTGGCTGGTTCTCTTTTTTTTTCCCTTTTCAATAGGCATTTCACAGTGCATAAAACAACAGGTTAATTAAGGTAGAACATATTTCTTGCATGCAAAGGAAATGTTATCCATGTAACAAAACTTTGACGGAAGAAATTGCATTAATGATATCAAGAGCATAAGTTATAAGTATCAACTCCTAAAATCTCTAAGATGCCCCTGGGGTTTTTGCGCGTAAAACCCCCCTACCCCCCTAAACTCCTGAGATCACTAACACTCCTGAAAACCCCCGAAAACAGGAAAACCTCTAGTAGTTTATTTTAAAGCCCAAAATGTATCTATTTACATTATTGTAAATATGCGCAA
 
If you scroll up and down the page quickly enough, you can see a scene from Bambi in the sequence data...
 
What I´d probably do with that huge a string is write a tiny QBasic program to turn it into 16000 individual lines of 1 character, then import that.
Or write a tiny QBasic program that does all the counting, which should be easy enough. QBasic: seriously useful.
 
If you need to do similar in the future you could just paste the string as is into cell A1 and use this formula in cell B1:

=MID($A$1,ROW(),1)

and copy it down however many rows you need. The advantage to this is you could save the spreadsheet once setup and then just paste new strings into A1 and it will still work.

If you don't want to start in B1 you can do it anywhere but you have to put in an offset for the row. If you start on row 3 for example the formula would be

=MID($A$1,ROW()-2,1)

If using the row is a problem(which it can be if you need to cut and paste) you could also create a position column refer to it instead of ROW()

The format of the MID function is MID(text,start character position, number of characters)
 
Last edited:
If you need to do similar in the future you could just paste the string as is into cell A1 and use this formula in cell B1:

=MID($A$1,ROW(),1)

and copy it down however many rows you need. The advantage to this is you could save the spreadsheet once setup and then just paste new strings into A1 and it will still work.

If you don't want to start in B1 you can do it anywhere but you have to put in an offset for the row. If you start on row 3 for example the formula would be

=MID($A$1,ROW()-2,1)

If using the row is a problem(which it can be if you need to cut and paste) you could also create a position column refer to it instead of ROW()

The format of the MID function is MID(text,start character position, number of characters)

Ooooohhh!

I like that - that is *elegant*

:D

Thanks! And thanks again to the other folks who responded - I really do appreciate it!
 
What I´d probably do with that huge a string is write a tiny QBasic program to turn it into 16000 individual lines of 1 character, then import that. Sorry, not very helpful, I know :)

(Although I'd be happy to do the transformation for you and send you back the file - should only take 5mins. Let me know.)

Or write a tiny QBasic program that does all the counting, which should be easy enough. QBasic: seriously useful.

I didn't think qbasic existed on operating systems since last century?

Too late I know but in Excel you could use VBA to do it. Then you would have a tool for doing it again. VBA rocks.
 
Ba

I did the same thing earlier in QBasic:

Code:
DIM Nucleotide AS STRING * 1
DIM Counter(1 TO 4) AS LONG
CONST Adenine = 1
CONST Guanine = 2
CONST Thymine = 3
CONST Cytosine = 4


OPEN "dna.txt" FOR BINARY AS #1
DO UNTIL EOF(1)
GET #1, , Nucleotide
SELECT CASE Nucleotide
CASE "A": Counter(Adenine) = Counter(Adenine) + 1
CASE "G": Counter(Guanine) = Counter(Guanine) + 1
CASE "T": Counter(Thymine) = Counter(Thymine) + 1
CASE "C": Counter(Cytosine) = Counter(Cytosine) + 1
END SELECT
LOCATE 1, 1
PRINT "Adenine :"; Counter(Adenine)
PRINT "Guanine :"; Counter(Guanine)
PRINT "Thymine :"; Counter(Thymine)
PRINT "Cytosine:"; Counter(Cytosine)
LOOP
CLOSE #1

It uses a separate file with all the C, A, G and Ts in it. Not being optimised for speed, it takes no less than half a second to get the result:

Adenine : 4720
Guanine : 2648
Thymine : 4227
Cytosine: 4916

If BTMO needs to do stuff like this more often and also needs the results in an Excel worksheet, it would indeed be a good idea to use some VBA code to take care of it all.
 

Back
Top Bottom