
In a recent coding challenge one question was to get some statistics on the length of rope left after a certain number of cuts. Starting with rope length N, the rope is cut into 3 random but integer length pieces; if this is impossible no more cuts are made. The largest piece resulting is then subject to 2 more cuts. Repeat this T times. The last remaining longest piece is recorded. The statistical questions were on the conditional probability of a certain length if at least at a certain other length.
This post is about the code I wrote for this challenge, and what I would improve if I were to rewrite it.
I implemented a recursive algorithm to generate lots of data and put it into a list. Then I imported the list as a pandas data frame for the statistical analysis. Here is the core function (with debugging print statements removed):
def cut(N, T): bit = N T_counter = T if T_counter >= 0: x = random.randrange(1, N-1) y = random.randrange(x, N) peices = [ x, y - x, N -y] bit = max(peices) T_counter -= 1 if bit > 3: return cut(bit, T_counter) else: S_list.append(bit) else: S_list.append(bit)
There are a couple parts that I would change if I were to rewrite without time constraints. If you are a very careful reader, you would notice that the function adds to a list thats not defined within the function… this is sloppy. I should make the function return a list, then call the function as so:
list_to_analyze = cut(N, T)
I had the code save to file a .csv made from the list of end rope lengths, this was only necessary to backup my data if my ancient laptop were to crash. I could have just turned the list into a pandas data frame and saved a few lines.
I didn’t have time to learn how to use bayespy, but that would have made the statistical portion of the code much more readable.
Finally, making the N and T’s user defined, rather than hardcoded would be a nice touch, say with sys argv or even user defined input.
Well, thats it for now, and if you want to see the whole script, it follows.
~Rafael
import matplotlib.pyplot as plt
import pandas as pd
import math
import csv
import math
import random
S_list = []
def cut(N, T):
bit = N
T_counter = T
if T_counter >= 0:
x = random.randrange(1, N-1)
#print 'x = {} '.format(x)
y = random.randrange(x, N)
#print 'y = {} '.format(y)
peices = [ x, y - x, N -y]
bit = max(peices)
#print 'bit = {} '.format(bit)
T_counter -= 1
if bit > 3:
return cut(bit, T_counter)
else:
S_list.append(bit)
#print 'N too short end. S = {} added to S_list \n'.format(bit)
else:
S_list.append(bit)
#print 'T end. S = {} added to S_list \n'.format(bit)
iter = 100000000
while iter >=0:
cut(1024, 10)
iter -= 1
#print 'S_list = {}'.format(S_list)
csvfile = "N1024_T10.csv"
with open(csvfile, "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in S_list:
writer.writerow([val])
S_list = []
iter = 100000000
while iter >=0:
cut(64, 5)
iter -= 1
#print 'S_list = {}'.format(S_list)
csvfile = "N64_T5.csv"
with open(csvfile, "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in S_list:
writer.writerow([val])
# name data colums and read data
name_64 = ['S_64']
name_1024 = ['S_1024']
df_64 = pd.read_csv('N64_T5.csv', names = name_64)
df_1024 = pd.read_csv('N1024_T10.csv', names = name_1024)
# simple extraction of stats
print df_64.describe()
print '\n'
print df_1024.describe()
print '\n'
# count instances for prob
dens_64_tmp = df_64['S_64'].value_counts(normalize=True)
dens_1024_tmp = df_1024['S_1024'].value_counts(normalize=True)
# filter data for conditional probs
dens_64 = dens_64_tmp.to_frame().reset_index()
dens_1024 = dens_1024_tmp.to_frame().reset_index()
#
dens_64_fil_8 = dens_64['index'] > 8
dens_64_fil_4 = dens_64['index'] > 4
dens_1024_fil_12 = dens_1024['index'] > 12
dens_1024_fil_6 = dens_1024['index'] > 6
dens_64_over_8 = dens_64[ dens_64_fil_8 ]
dens_64_over_4 = dens_64[ dens_64_fil_4 ]
dens_1024_over_12 = dens_1024[ dens_1024_fil_12 ]
dens_1024_over_6 = dens_1024[ dens_1024_fil_6 ]
# calc probability of everything
dens_64_sum_over_8 = dens_64_over_8['S_64'].sum()
dens_64_sum_over_4 = dens_64_over_4['S_64'].sum()
prob_over_8_if_over_4 = dens_64_sum_over_8 / dens_64_sum_over_4
dens_1024_sum_over_12 = dens_1024_over_12['S_1024'].sum()
dens_1024_sum_over_6 = dens_1024_over_6[ 'S_1024'].sum()
prob_over_12_if_over_6 = dens_1024_sum_over_12 / dens_1024_sum_over_6
# print conditional stats
print 'dens_64_sum_over_8 = \n{} \n'.format(dens_64_sum_over_8)
print 'dens_64_sum_over_4 = \n{} \n'.format(dens_64_sum_over_4)
print 'prob_over_8_if_over_4 = {} \n'.format(prob_over_8_if_over_4)
print 'dens_1024_sum_over_12 \n{}\n'.format(dens_1024_sum_over_12)
print 'dens_1024_sum_over_6 \n{} \n'.format(dens_1024_sum_over_6)
print 'prob_over_12_if_over_6 = {} \n'.format(prob_over_12_if_over_6)
Leave a Reply