python程序作业代写, python代写, python程序代写, python编程代做
1
TIE2030 Programming Methodology with Python
Take-Home Assignment
Release Date: 12 November, 2021 (~2pm)
Deadline: 15 November, 2021, 11pm Late Submission Deadline (25% Penalty):
17 November, 2021, 11pm No submissions will be accepted after 17 November,
2021, 11pm
Details on what you need to upload can be found on Page 9. 2
Problem Statement: In this assignment, you will be searching for a given
list of motifs (small size fixed patterns) in a given list of DNA sequences. A
DNA sequence is made up of fundamental Amino Acids – A (adenine), G
(guanine), C (cytosine), T (thymine). Motif finding is an important problem
in the Bioinformatics domain. Motif finding helps to understand several
common features between species, allows us to understand human diseases, and
helps drug manufacturers to target towards manufacturing certain drugs.
Input and Output Data:
Following are given to you in this assignment.
•
List of DNA sequences: To be read from the sequences.txt file. Each line
of the file contains one DNA sequence.
•
List of Motifs: To be read from the motifs.txt file. Each line of the file
contains one motif.
Write a Python program to perform the following analysis. Your outputs must
be written with clear messages to the DNA_analysis_results.txt file.
What do you need to do in this take-home assignment?
You need to write a program that searches for ALL the motifs given to you
in each DNA sequence, generate the output and store in a dictionary. Also,
compare the list of sequences given to a target sequence and report the
statistics. You may use any built-in library function, if needed.
ANSWER ALL THE FOLLOWING:
Read through all the questions and the skeleton code in Page 7 before you start
coding. Use the skeleton code. All of the functions specified from Questions (2), 3
(3), (4), (5), (7) must be called from your main() function [Note: No parameters
are passed into main() function].
(1)
In your main() function, read the motifs from the file motifs.txt and store
them in a list. Create a dictionary Motif_Count_Dictionary that takes the
motifs you have read from the file motifs.txt as keys, and their values are
initialized to zero. Read the DNA sequences from the file sequences.txt.
Write each DNA sequence and its length with a clear meaningful message
to your output file DNA_analysis_results.txt. See the sample output in
Page 8.
(2)
Write a Python function Nucleo_Counter(…) and pass each of your DNA
sequence and other parameters needed for this function. Call this function
from the main() function. The function must count the number of
occurrences (frequencies) of each nucleotide A, G, C, T in the DNA
sequence you pass in. Write your counted values with a clear meaningful
message to your output file DNA_analysis_results.txt. See the sample
output in Page 8.
(3)
Write a Python function Motif_Counter(…) and pass each of your DNA
sequence, your Motif_Count_Dictionary, and other parameters needed for
this function. Call this function from the main() function. The function must
count the number of occurrences (frequencies) of each motif that you
have read from the file motifs.txt and accumulate the counts to the
corresponding fields in your Motif_Count_Dictionary. For example, the
number of occurrences of motif TC must be added to the entry with key TC
in your Motif_Count_Dictionary.
Write your counted values with a clear meaningful message to your output
file DNA_analysis_results.txt. See the sample output. 4
(4)
Write a Python function Freq_Counter(…) to determine which motif
most frequently occurs (maximum frequency) and which motif least
frequently occurs (minimum frequency) in the given DNA sequences. Pass
your Motif_Count_Dictionary and other parameters needed for this
function. Call this function from the main() function. Write your results –
corresponding motifs and their frequencies, with a clear meaningful
message to your output file DNA_analysis_results.txt. See the sample
output.
Important Note: If there are more than one motifs that occur most
frequently and least frequently, write all of them to your output file.
(5)
Define a target sequence Target_Seq as following (refer to the skeleton
code on Page 7):
Target_Seq = ‘ATGGGGAATGCGCAATGCAACGTAATTTAGAGGAGCCCCAGTTTGAAAGT’
Write a Python function Target_Search(…) to compare each sequence in
your given DNA sequences against the target sequence Target_Seq. Pass
each of your DNA sequence and other parameters needed for this function.
Call this function from your main() function. The function must perform the
following:
Count the number of elements matching exactly in the respective
locations between the DNA sequence you passed in and Target_Seq. This
gives the “similarity” between that DNA sequence and the target sequence
Target_Seq. Return this value from your Target_Search(…) function to
your main() function.
For example, given a target sequence:
ATGTAAAGCCTATAGTGGGGC
and a DNA sequence, say: 5
ATGTTTTGCCTATAGTATGGCATAGTAGTA
the similarity score between above example sequences is 16.
After finding all the similarities, in your main() function, find the
sequences that are most similar and the sequences that are least similar
from Target_Seq. Print your results with clear meaningful messages to
your output file DNA_analysis_results.txt. Refer to the sample output.
Important Note: If there are more than one sequences that are most/least
similar to the target sequence, write all of them to your output file.
(6)
In your main() function, measure the time taken to run your analysis
as required from Question 1 to Question 5 (the time to run from the start
of your program to the end of your code for Question 5). Write the time
you measured with a clear meaningful message to your output file
DNA_analysis_results.txt. See the sample output.
(
(7)
Write a Python function Plot_Chart(…) to plot a bar chart that shows
the total number of occurrences of each motif in all of the DNA
sequences given in sequences.txt (the counts that you have accumulated
for each motif in your Motif_Count_Dictionary). Clearly present your
chart with all the required information, title, and axis labels, as shown in
the sample output. Pass your Motif_Count_Dictionary and other
parameters needed for this function. Call this function from the main()
function.
( 6
IMPORTANT FEATURES:
•
Displaying your results with meaningful messages and clarity, writing
meaningful comments (in addition to the comments given in the
skeleton), and using meaningful variable naming also carry marks.
Refer to the rubrics.
•
In your output file, you need to print the length, counts of nucleotides,
counts of motifs, and similarity to the target sequence for each DNA
sequence before proceeding to the next DNA sequence. Refer to the
sample output.
•
Your code should be able to give the correct results for different DNA
sequences and motifs (which also consist of A, T, G, C nucleotides),
without changing the code. That is, if we change some motifs and
sequences in motifs.txt and sequences.txt, and run your code, we expect
the correct results for the new input data in your output file
DNA_analysis_results.txt.
•
For file writing, you can either write to the output file while processing or
store your output messages in a list of strings and write to the output file
at the end of the program. 7
SKELETON CODE:
Use the skeleton code below.
Note: you must NOT declare any variables outside the functions other than
Target_Seq. You are allowed to write additional functions if needed. 8
SAMPLE OUTPUT:
Important note: The results shown in the sample output below are obtained
using DIFFERENT data from the files given to you. Therefore, the results shown
below DO NOT match the results you obtain using the data in sequences.txt and
motifs.txt. You should compute and check your results by yourself using the data
in the two given files. 10
RUBRICS:
Part
Description
Marks
1
For reading data from files; initiate the
Motif_Count_Dictionary;
printing the DNA sequences and their lengths to the output file.
2
For the logic and execution of the Nucleo_Counter(…) function.
3
For the logic and execution of the Motif_Counter(…) function.
4
For the logic and execution of the Freq_Counter(…) function.
5
For the logic and execution of the Target_Search(…) function; for
finding the DNA sequences that are most and least similar to the target
sequence.
6
For measuring the processing time.
7
For the logic and execution of the Plot_Chart(…) function; displaying
the chart clearly with all the required information.
Coding
quality,
output
display
For writing clear code and comments; using meaningful variable
names; printing outputs with clear messages; following skeleton code
and other requirements in the questions and template; use of function
and parameter passing.
Total
WHAT IS THAT YOU NEED TO UPLOAD? 11
Upload a .zip file in the StudentSubmission_TakeHomeAssng_TIE2030 folder
with name:
containing the following files:
(1)
Your working code CODE_
(2)
Your report (convert to PDF after you complete)
REPORT_
(3)
Your output file DNA_analysis_results.txt.
Refer to the template. Please follow the file naming strictly.
