due 11/7/2024 before midnight via Learning Suite 25 possible points
You’re part of a start-up developing lower-limb protheses and are trying to decide what sizes to produce. You use a public dataset provided by the US Army, where they have taken various measurements of US Military personnel (we’ll use this dataset, but you should be skeptical that this population is representative of the US population in general). The data is available here (may require you to create an account to download). The two csv files are what we need (data for males and females).
a) The data we are after is labeled “kneeheightmidpatella”, measured in mm, which is the distance from the bottom of the foot to the middle of the knee. Combine the values from the male and female samples, and plot a histogram. Choose an appropriate amount of bins so that the distribution is clear.
You could use np.loadfromtxt
like we’ve done before, but there’s a lot of columns so it will be much easier to refer to the columns by their title rather than by index number (which is what you’d have to do with loadfromtxt). The Python library, pandas, is a popular way to work with data like this, but for our purposes we can avoid learning another library and get by with numpy. The following lines would allow extract this data from the female.csv file. The first line is a function specifically for reading from csv files (and requires specifying the encoding type the file was saved in). In the next line we can index the list by name rather than number, where the name corresponds to the labels in the first row of the csv file.
dataf = np.recfromcsv('female.csv', encoding='ISO-8859-1')
knee_f = data_f['kneeheightmidpatella']
If your version of numpy doesn’t have recfromcsv, you could replace that line with genfromtext, which is a general purpose file reader.
dataf = np.genfromtxt('female.csv', delimiter=',', names=True, encoding='ISO-8859-1')
b) You can only make a limited number of protheses sizes in your early startup phase, so let’s focus on people with measurements between 425 and 550 mm. Filter this data out, then your task is to determine four different size ranges, such that each size fits approximately 25% of the participants (hint: percentiles). What is the maximum and minimum range for each of your four sizes?
c) Let’s also determine if there is a strong correlation between “kneeheightmidpatella” and “stature” (the person’s height, also in mm) with stature as the x-axis. Create a scatterplot of the data and on the same figure, plot a least squares fit. Report also the corresponding correlation coefficient.
The remaining problems are from “Principles of Statistics for Engineers and Scientists” by William Navidi.
Let V be the event that a computer contains a virus, and let W be the event that a computer contains a worm. Suppose P(V) = 0.15, P(W) = 0.05, and P(V or W) = 0.17.
a. Find the probability that the computer contains both a virus and a worm.
b. Find the probability that the computer contains neither a virus nor a worm.
c. Find the probability that the computer contains a virus but not a worm.
Of all failures of a certain type of computer hard drive, it is determined that in 20% of them only the sector containing the file allocation table is damaged, in 70% of them only nonessential sectors are damaged, and in 10% of the cases both the allocation sector and one or more nonessential sectors are damaged. A failed drive is selected at random and examined.
a. What is the probability that the allocation sector is damaged?
b. If the drive is found to have a damaged allocation sector, what is the probability that some nonessential sectors are damaged as well?
c. If the drive is found to have a damaged nonessential sector, what is the probability that the allocation sector is not damaged?
A quality-control program at a plastic bottle production line involves inspecting finished bottles for flaws such as microscopic holes. The proportion of bottles that actually have such a flaw is only 0.0002. If a bottle has a flaw, the probability is 0.995 that it will fail the inspection. If a bottle does not have a flaw, the probability is 0.99 that it will pass the inspection.
a. If a bottle fails inspection, what is the probability that it has a flaw?
b. If a bottle passes inspection, what is the probability that it does not have a flaw?
c. Explain why a small probability in part (a) is not a problem, so long as the probability in part (b) is large.