Microsoft Coding Competition - University of Missouri I

Wu Rui bio photo By Wu Rui Comment

This is a Coding Competition MS held at University of Missouri - Columbia on Sep 17th 2014. Similar events will be held in several Universities.

Team up to 3 people. Six problems are given and you have 2 hours to get as many scores as you can. There’s no partial score for each question. The best strategy I would recommend is starting from the easiest and moving on to harder problems.

Question One Cure the virus | 3 point(s)

There is an outbreak of VirusX in Redmond. The local authorities have discovered a vaccine that protects against infection by VirusX. However, they can only produce so many vaccines per week. Your job is to figure out the best order in which people should be vaccinated. Everyone will get vaccinated, but the people more likely to be sick should be vaccinated first. You are more likely to get sick if the virus` DNA closely matches somewhere in your DNA.

However, there is a catch: the virus can mutate. Over time, the virus can “cut” letters from its DNA. So it is possible that the AAAAG virus mutates into the AAAG, AAAA, …, A viruses. Removing letters is the only possible mutation: the virus cannot change any letter.

For example, if your DNA is GGGGGGGAAAAGGGGGG and the virus DNA is AAAAG, you have 1.0 probability of getting infected because all of the letters of the virus are found in the same order in your DNA. If your DNA was TTTTTTTAAAATTTTTT, then the virus only partially matches your DNA. For partial matches, we define the probability to get sick by the numbers of continuous letters of the virus’ DNA that matches your DNA divided by the total number of letters in the un-mutated virus’ DNA. With the example above, AAAA is the longest possible mutation that matches in your DNA, so your probability of getting sick is len(AAAA) / len(AAAAG): 0.8.

You are provided with the DNA of every inhabitant of Redmond and the virus’ DNA. For every inhabitant, you need to output the order in which he/she should get vaccinated. Everyone that has the same probability to get sick should get vaccinated in the same batch. For example, the order of vaccination for inhabitants with the following probability to get sick: [0.0, 0.5, 0.5, 0.0, 1.0] is: [3, 2, 2, 3, 1].

Tip Here is a tip for how to get from the “Canonical Sample Input” to “Canonical Sample Output” (described below). Given the TAAT virus and its highest infection score possible mutation for persons 0 to 4, here are the longest matching mutations and the attached probability to get sick for every person:

Person #0: T, 0.250

Person #1: TAA, 0.750

Person #2: TT, 0.500

Person #3: TAAT, 1.000

Person #4: TA, 0.500

Therefore, person 3 has a 1.0 probability of getting sick, so he or she gets vaccinated first. Person 1 is the next highest, so he or she gets vaccinated second. We continue to vaccinate in batches of probability of getting sick, with person 0 getting vaccinated last because he or she has the least probability to get sick. Note that what is described above is not the output you should generate, but a tip on how you can get to the desired output inside your code.

Input description/format

The first line of the file contains three numbers: the number of citizens in Redmond, the length of the citizen’s DNA and the length of the virus’ DNA. This is followed by one sequence of DNA per line, representing in order the citizens’ DNA. The last DNA sequence of the file is the virus’ DNA. You can assume that the virus’ DNA sequence is always shorter than the citizen’s DNA sequences. Output description/format The below example becomes clear if we give the “DNA match score” as well as the longest sequence of DNA matched.

Person #0: G, 0.333

Person #1: GCC, 1.000

Person #2: CC, 0.667

Person #3: GC, 0.667

Person #4: GC, 0.667

Person 1 has a 1.0 probability of getting sick, so he gets vaccinated first. Persons 2, 3, and 4 are the next highest probable to get sick, so they get vaccinated second. Person 0 has the least chance to get sick; he gets vaccinated last.

Example input

5 10 3

AGGAAAAGAG

CGAGGCCAAC

GACAAAACCG

GCGACGCAGA

AACAGCGCAG

GCC

Example output

Person #0: 3.

Person #1: 1.

Person #2: 2.

Person #3: 2.

Person #4: 2.

My Solution in Java

In a competition environment, it’s really easy for people like me to scan the problem description really quick and start coding.  I trade this problem as a typical longest common substring at first glimpse but actually it’s  not. The virus can mutate, letters can be cut out from its sequence while people’s DNA need to be continuos which makes this problem a mix of longest common substring and longest common subsequence. Then, if you are familiar with DP( Dynamic Program) solution for LCS, you can easily modify it into the correct answer of this problem.

comments powered by Disqus