Data analysis 1, linear relations, proportionalities

Up a level : IB Physics
Previous page : Data Analysis in lab rapports and explorations
Next page : Data analysis 1, linear relations,not a proportionality

Data Analysis 1, linear relations, proportionalities

In the Excel file Analysis 1-V3 under the leaf “The Data, part 1, I” you will find some data, that are supposed to be the data of some experiment where you are supposed to find the resistance vs. the   length of some wire. The length is measured to the nearest mm, and the resistivity to the nearest 0.1 Ω. The error in length inherent in the measurement  is thus ±0.5 mm, and in resistance, ±0.05 Ω. The error in the dependent variable is usually larger due to various random, or possibly systematic, errors. The error in the independent variable (say the length you select) is usually quite small.

The question could be to find how the resistance varies with the length. Since we expect the resistance to approach 0 as the length  approach 0 we will expect  a straight line through the origin, i.e. a proportionality.

Our goal is to find an equation of the form

R=(m±uncertainty in m)L

where m is a constant.

I suggest you try to work yourself trough the steps on the data in the first leaf, and use the second  leaf named “The analysis part 1”  as a key.

This is our data

Step 1: Find the average and the standard deviation of the resistance in each row.

In the English Excel you do that using the functions AVERAGE and STDEV.S Also, set the number of displayed figures in the calculations to something suitable.  In this case, say 2. You can do this by selecting the cells, right click, select Format Cells,  Number, and then you can set the number of decimals to say 2. (In the Swedish Excel you use  the functions MEDEL and STDAV.S, in other languages they will have other names.)

Important: If we have calculated the standard deviation, then the error we will actually (usually) use is the standard deviation divided by the square root of the number of measurements at a given input value, in this case 5 measurements. This because it is reasonable that the more measurements we do, the better we know the actual value, even if we have errors in particular measurements. For why we use the square root of the number of measurements, follow the link https://en.wikipedia.org/wiki/Standard_error. This error estimate is usually called the standard error of the mean, SEM.

You will quite often find that this new error estimate is in fact in error, especially if you have as few data points as 5.  It is rather good to have a discussion about this in your work if this happens. The chance for average to deviate far away from the expected average is simply rather high if we just have a few values.

We now have:


I have added zeros on top of the L, Mean, SD and SD/SQRT 5 columns, this because we are (most likely) dealing with a proportionality, and hence the graph will go through the origin, with an uncertainty of 0. We will use this later on.

I have also copied  the length data and the average resistance data to two new columns. This to simplify the next step, plotting the data. I would though not just copy and paste, but instead use “=”. Given the Excel file we have I would make the I column my new L, then click I7, write “=” then select C7, Enter. This would make the cell K7 have the same value as C7, even if I later on change the value of C7- this to keep my data consistent.

I can then select the cell, I7, right click on the lower right corner, drag down, release the key, and select “Copy  cells”, or left click, drag down and release.

Next I do the same thing in the J column but with the mean R values. I would thus have “=AVERAGE(D7:H7)” (or “=MEDEL(D7:H7)” is the Swedish version) in the cell J7.

The standard deviations we get should  be somewhat consistent with the supposed accuracy of the instrument. If not, then some other error might have crept in.  In this case the standard deviation do vary quite much more than the estimate we had (0.05 Ω) and indicates that we have some more sources of error than the pure measurement errors from our instrument.

Step 2: Plot the data.

Select the two columns for L and R, including labels, then make a scatter plot. To do this, on the Insert tab, in the Charts group, choose Scatter, and select Scatter with only Markers. (In Swedish Excel: Infoga, then under Diagram select Punkt.). You should now get something like this

The relation looks rather linear. To test this we right click on a point, and select to add a linear trend line. Select to show the equation and also the R2 value.

An R2 value of 0.993 shows us that it is indeed close to linear, and with a slope of 1.4 Ω/m. The relatively small “y“-intercept value of 0.022 show us that we indeed might have a proportionality.

Important: The above graph is useful in your own analysis, but not if you intend to use it in your exploration. There are several problems with it. We don’t have any axis labels, no grid lines, and the equation is express using x and y. We will look at how to make a nice graph later on.

Hmm. But this is the value without possible errors. We can start making a new graph and adding some error bars.

So, again, select the two columns for L and R, including labels and make a scatter plot as above.  Next click the Chart Elements button next to the chart, and then check the Error Bars box. You might also get here under Layout, Error bars. Next you select Custom, and  Pick More Options to set your own error bar amounts, and then under Vertical Error Bar or Horizontal Error Bar, choose the options you want. In the end you want to be able to select custom error bars. When I do it I get something like this:

I use a Swedish Excel 2007, so it will most likely look different for you. Change the values by writing ”=” and then select the SD/SQRT 5 column. In this case we will use the same value for the positive and negative values. This may gives you something like this:

If you get horizontal error bars, just select one of them and press Delete to remove them. Then adding a trend line we get:


Next we need to add lines with the largest and the smallest possible slopes. In this case we know the line should go through (0,0), so we will use that as our starting point. Next we add endpoints. This we do by adding values in two additional columns. I have also added a new row for the length 0. One also needs to add a zero in the beginning of the SD column, to make it have the same number of rows.

Important: Give the two new columns labels, or headlines (like Min and Max) else plotting does not work as intended (I have no clue why). Se to that all columns have labels, then select the columns including the labels, and then do a scatter plot. Excel will then use the first column as the “x”-values, and plot the data from the other columns vs. that. If you don’t have labels, it will use “1,2,3…” as “x”-values, and plot all the columns against those values.

I have added values in the last row that are small enough and large enough to clearly be outside the values we will use in the end – this to make it easier to select the points.

If we now select the four columns, and make a scatter plot, we get:


We can then add back error bars. Here we now need to use all values (including the 0) in the SD/SQRT 5 column. We then add the lines for the min and max slopes. We do this by selecting, say the upper green point, and then add trend line. We do the same with the red point. Don’t forget to include the equations.

Next we change the top max and min values to fit the error bars. I started with simply using the values at the endpoints. 1.44–0.12=1.32 and 1.44+0.12=1.56.

This does not fit through the error bars, and it looks like it, on average, is above points. We will have to find some kind of best fit, that still have about the same error.

This is what I got:

The topmost line misses the middle point’s error bar a bit, but it will, on average, fit the rest of the error bars better. This for the values:

If we have two values that are the end of an error bar, then the average is the midpoint, and the distance between the values divided by two will give us the ± value.We will use this method over and over again.

(The actual values you use is not that critical as long as the lines fit the data somewhat. The reason for this is that we will round the uncertainties to one significant figure. If you get an  uncertainty of  ±0.062 or ±0.086 does not really matter. You will round it to ±0.1 anyhow. )

We thus have a slope between 1.3 and 1.45. This would give an average of 1.375. Half the difference would give us the errors, ±0.075. We round that up and to one significant figure, i.e. to ±0.1. We then round the average to the same decimal position. This would add an error of 0.045, so to cover this we change our error to ±0.2, to finally give us the slope 1.4±0.2.  The equation of resistance vs. length would thus be:

R=(1.4±0.2)L

If this was a part of an IA in physics, and you would have one the above analysis (without all my words) then you would be a long way in, in fulfilling the criteria:

“Appropriate and sufficient data processing is carried out with the accuracy required to enable a conclusion to the research question to be drawn that is fully consistent with the experimental data.

The report shows evidence of full and appropriate consideration of the impact of measurement uncertainty on the analysis.”

Up a level : IB Physics
Previous page : Data Analysis in lab rapports and explorations
Next page : Data analysis 1, linear relations,not a proportionalityLast modified: May 2, 2018 @ 09:36