

ORIGINAL ARTICLE 

Year : 2021  Volume
: 9
 Issue : 2  Page : 9091 

Statistical corner: Building own functions in R
Mikko Pyysalo
City of Tampere, Oral Health Services; Oral and Maxillofacial Unit, Tampere University Hospital; Hemorrhagic Brain Pathology Research Group, University of Tampere, Finland
Date of Submission  27Dec2021 
Date of Decision  04Jan2022 
Date of Acceptance  10Jan2022 
Date of Web Publication  5Apr2022 
Correspondence Address: Dr. Mikko Pyysalo City of Tampere, Oral Health Services; Oral and Maxillofacial Unit, Tampere University Hospital; Hemorrhagic Brain Pathology Research Group, University of Tampere Finland
Source of Support: None, Conflict of Interest: None
DOI: 10.4103/jcvs.jcvs_22_21
Introduction: R is a programming language that can be used to efficiently solve statistical problems. Objectives: To demonstrate the method of building functions using R in statistical analysis and storing it for future use. Materials and Methods: A real world example of comparison of statistical values in aneurysm research with respect to its locations has been used to demonstrate how one might build a function using R. Results: Accurate results could be obtained and the function can be stored for later use. Conclusions: As the statistical tools required in research are recurrent, building functions in R and storing them for future use is highly recommended.
Keywords: Microvascular decompression, root entry zone, trigeminal neuralgia
How to cite this article: Pyysalo M. Statistical corner: Building own functions in R. J Cerebrovasc Sci 2021;9:901 
Since R is a programming language, it is possible to build its own functions to solve specific problems. If solving a statistical problem requires more than few lines of code, it is very efficient to create a function and store it for future purposes. The other way is to write the same code again and again with a certain risk of typing mistakes. Basic structure of writing a function is:
my_function < function (arguments) {
function body
}
As an easy example, we will build a function which adds 1 to a given value. We will give it a descriptive name: add_one(). The function requires only one argument, a, adds one to it, stores the new value to variable b and prints the result to the screen.
add_one < function(a) {
b < a+1
print (b)
}
Now type add_one(2), press enter and R should return a value 3.
For example, in the field of aneurysm research, one occasionally might need to compare if the specific locations of the aneurysms differ between the groups. To solve this problem, both Fisher's exact test and Chisquared tests are used. These tests are statistical significance tests used in the analysis of contingency tables. Both Fisher's test and Chisquared test require a numeric matrix as an input. We will build a function called 'location_p_values'. To be able to understand the structure of the function, one must know the basic structure of looping in R. We will use socalled forloop. The basic form of a forloop is:
for (i in 1:10) {
print(i)
}
The loop gives i values from 1 to 10 and prints them to the screen. Next loop prints 'cat', 'dog' and 'mouse' to the screen.
for (i in c(”cat”, “dog”, “mouse”)) {
print(i)
}
The function 'location_p_values' is written:
location_p_values < function(names, case, control) {
require(MASS)#load MASS package
fisher_p_values < numeric(0)#create empty vector
chi_p_values < numeric(0)#create empty vector
for (i in 1:length(case)) {
cases < c(case[i],sum(case)case[i])
controls < c(control[i],sum(control)control[i])
location_data < cbind(cases,controls)
location_matrix < as.matrix(location_data)
fish < fisher.test(location_matrix)
chi < chisq.test(location_matrix)
fisher_p_values[i] < fish$p.value
chi_p_values[i] < chi$p.value
}
location_data_frame < cbind.data.frame(names, case, control, fisher_p_values, chi_p_values)
names(location_data_frame) < c(”location”, “cases”, “controls”, “fisher pvalues”, “chisquared pvalues”)
print(location_data_frame)
}
This function requires three arguments: 'names', 'case' and 'control' variables. They must be of same length to make the function work properly, of course. First three lines make sure that MASS package is installed and loaded and create two empty vectors to which the data will be stored. Inside the forloop, the first line takes the first value from the variable 'case' and defines how many that type of aneurysms there is, and how many there is not. The result is stored to 'cases' variable. Then, the second line does exactly the same thing to 'control' variable. Then, the result is stored to a dataframe (cbindcommand) and converted to numeric matrix for statistical testing. Then, both Fisher's test and Chisquared tests are performed and P values are stored to fisher_p_values and chi_p_values variables. Then, the loop goes on line by line as many times as the variable length demands. When the loop is ready, the output is printed. The best way to understand how the function works is to test it. You can type:
> locations<c(”ica”, “mca”, “acoa”, “vba”, “aca”)
> ruptured<c(1,5,0,1,0)
> unruptured<c(1,23,4,0,1)
>location_p_values(names=locations, case=ruptured, control=unruptured)
Then, press enter. R should you an output [Table 1] that tells you that the prevalence/incidence of the aneurysms in any given location is not significantly different between the groups. When the data are small, Fisher's test is more reliable than Chisquared test.
Loading Required Package: MASS   
In this short tutorial, the basics of looping and writing a function were introduced. As researchers concentrate on specific field of research, the study designs and statistical problems tend to be similar during the career. That is why it is highly recommended to learn writing functions and store them for future purposes.
[Table 1]
