#P1115. Statistical Trouble
Statistical Trouble
Description
Your team was hired by the international corporation ACM (Analytical Calculation Maxims). Every year ACM creates and conducts various surveys. Surveys themselves are simple forms with a list of questions and a list of possible answers for every question. Surveys are distributed around the globe, where field agents question the target group of people. All the answers are gathered in the ICPC (International Computation and Processing Center), where teams of well-paid analysts mine raw data in search for relevant correlations. The raw data for each individual survey consists of lots of lines of answers. Each line corresponds to every questioned person and for every question lists answers that the person has made on that particular survey.
The first step of analysis that your team was hired to automate is to create cross tables that correlate answers on interesting pairs of questions. In its most simple way, given a pair of questions, cross table has a row for every possible answer on the first question, and has a column for every possible answer on the second question. Each cell of the cross table contains a number of lines in the raw data that has both answers for the corresponding questions at the same time.
However, your task is complicated by the fact that you are to compute and output not only simple cross table values, but also total values for every row and column in the cross table (that is the sum of values in the corresponding row and column) that are placed in an additional last column and last row, as well as a percentage distributions for every row and column. Percentage distribution for a row is an additional number in every cell in that row that shows percent ratio of the value in that cell to the total value for that row, unless the total value is zero (in that case percentage distribution for this row is not defined). The same applies to the percentage distributions of columns. Thus, the cross table in your output will have at most three values in every cell (the value itself, row-wise percent, and column-wise percent). Please note, that percentage distributions also apply to totals. For example, in the total column for every row the row-wise percent will be always 100%, unless the total value for the row is zero (in that case row-wise percents are not defined), and column -wise percent shows percents ratio of the total value for this row to the total number of lines in the raw data (which is the value that can be found in the last column of the last row).
Percents are rounded to integers on output. Percent that has a non-zero fractional part is rounded to either the smallest integer number greater than the resulting percent, or the largest integer number smaller than the resulting percent, in such a way, that the sums of all corresponding row-wise percents by row (without row totals) or column -wise percents by column (without column totals) are equal to 100% unless they are undefined. There are various rounding algorithms that produce results satisfying the above constraints. You are free to use any rounding algorithm as long as the above constraints are satisfied.
Input
The input consists of 3 sections: survey description, survey results, and cross table descriptions.
The first line of the input contains the name of the survey, which is at most 100 characters long. Subsequent lines describe all the questions in the survey. On the first line of every question there is a 3-character question code (capital letters and digits only) followed by a space, and followed by the question name, which is at most 80 characters long. Each subsequent line for a question describes one possible answer on the question and starts with a space, followed by a single-character code for the answer (capital letter, digit, or character '.', '*', or '@'), followed by a space and followed by an answer description, which is at most 40 characters long. The list of questions is terminated by the line with a single character '#'. All answer codes are unique within the question, and all question codes are unique within the input file. There are at least 2 and at most 10 possible answers per question and at least 2 and at most 100 questions.
Next lines in the input file describe survey results. Every line contains a character per question (in the order they appear in the input file) that gives the answer code for the corresponding question. The characters follow one another without any delimiters. This section is terminated by the line with a single character '#'. There is at least one line with answers in the section and at most 10000 answers in total (the number of lines times the number of questions).
Next lines in the input describe cross tables that are to be created. Each cross table description occupies one line. That line contains the code for the first question, followed by a space, followed by the different code for the second question, followed by a space, and followed by the cross table name, which is at most 100 characters long. This section is terminated by the line with a single character '#'. There are at most 100 cross table descriptions in the input file.
The input has no trailing spaces on any line. All names do not start or end with a space, but may contain spaces.
Output
Write to the output a cross table for every cross table description in the input file in the order they appear in the input file. On the first line of the cross table write the survey name, followed by a space, followed by a '-' (dash) character, followed by a space, followed by the cross table name. Then write the description of the first question, and the description of the second question exactly as they appear in the input file and in the same format. Then write an empty line, followed by the table itself.
The table contains exactly 1+3*(N1+1) lines and exactly 6*(N2+2) characters on every line, where N1 is the number of possible answers for the first question, and N2 is the number of possible answers for the second question. The table has one line for column headings, and N1+1 rows (3 lines per row). The first N1 of these rows correspond to the answers on the first question in the order they appear in the input file, and the last row is for column totals. The table also has N2+2 columns, where each column is 6 characters wide. The first column is for row headings; the subsequent N2 columns correspond to the answers on the second question in the order they appear in the input file, and the last column is for row totals. All information in the cells (including headings) is aligned to the right and is padded on the left with spaces to become 6 characters wide.
The heading for the first column is empty. The headings for the subsequent N2 columns are composed from the second question code, followed by a ':' (colon) character, and followed by the corresponding answer code. The heading for the last column is the string "TOTAL" (without quotes). The headings for the first N1 3-line rows of the cross table are composed from the first question code, followed by a ':' (colon) character, and followed by the corresponding answer code. The heading for the last row is the string "TOTAL" (without quotes). Row headings are situated on the first line of the corresponding row. The subsequent 2 lines in the heading column of every row must be blank.
All non-heading cells in the table contain computed values and percents. On the first line of every cell the corresponding cross table integer value is situated. The second line contains properly rounded to integers row-wise percent, with a mandatory trailing '%' (percent) character, or a single '-' (dash) character if the corresponding row-wise percent is not defined. The third line contains column -wise percent in the same format. All cross tables in the output file must be separated by a single empty line.
New Year Phone Survey for ACM ICPC
Q01 Hello!
H Hello!
Y Yes!
* Uhm...
. (silence)
@ (other)
Q02 How are you?
H Hello!
Y Yes!
F Fine!
Q Who are you?
@ (other)
BYE Happy New Year!
Y You too.
* (censored)
@ (other)
. (hang up)
#
.@.
HH@
.@.
YFY
HQ*
H@.
YYY
.H@
HFY
HH@
#
Q01 Q02 Health vs greeting style
Q02 BYE Politeness matrix
#
New Year Phone Survey for ACM ICPC - Health vs greeting style
Q01 Hello!
H Hello!
Y Yes!
* Uhm...
. (silence)
@ (other)
Q02 How are you?
H Hello!
Y Yes!
F Fine!
Q Who are you?
@ (other)
Q02:H Q02:Y Q02:F Q02:Q Q02:@ TOTAL
Q01:H 2 0 1 1 1 5
40% 0% 20% 20% 20% 100%
66% 0% 50% 100% 33% 50%
Q01:Y 0 1 1 0 0 2
0% 50% 50% 0% 0% 100%
0% 100% 50% 0% 0% 20%
Q01:* 0 0 0 0 0 0
- - - - - -
0% 0% 0% 0% 0% 0%
Q01:. 1 0 0 0 2 3
33% 0% 0% 0% 67% 100%
34% 0% 0% 0% 67% 30%
Q01:@ 0 0 0 0 0 0
- - - - - -
0% 0% 0% 0% 0% 0%
TOTAL 3 1 2 1 3 10
30% 10% 20% 10% 30% 100%
100% 100% 100% 100% 100% 100%
New Year Phone Survey for ACM ICPC - Politeness matrix
Q02 How are you?
H Hello!
Y Yes!
F Fine!
Q Who are you?
@ (other)
BYE Happy New Year!
Y You too.
* (censored)
@ (other)
. (hang up)
BYE:Y BYE:* BYE:@ BYE:. TOTAL
Q02:H 0 0 3 0 3
0% 0% 100% 0% 100%
0% 0% 100% 0% 30%
Q02:Y 1 0 0 0 1
100% 0% 0% 0% 100%
33% 0% 0% 0% 10%
Q02:F 2 0 0 0 2
100% 0% 0% 0% 100%
67% 0% 0% 0% 20%
Q02:Q 0 1 0 0 1
0% 100% 0% 0% 100%
0% 100% 0% 0% 10%
Q02:@ 0 0 0 3 3
0% 0% 0% 100% 100%
0% 0% 0% 100% 30%
TOTAL 3 1 3 3 10
30% 10% 30% 30% 100%
100% 100% 100% 100% 100%
Source
Northeastern Europe 2001