• Users Online: 42
  • Print this page
  • Email this page


 
 Table of Contents  
ORIGINAL ARTICLE
Year : 2021  |  Volume : 9  |  Issue : 2  |  Page : 90-91

Statistical corner: Building own functions in R


City of Tampere, Oral Health Services; Oral and Maxillofacial Unit, Tampere University Hospital; Hemorrhagic Brain Pathology Research Group, University of Tampere, Finland

Date of Submission27-Dec-2021
Date of Decision04-Jan-2022
Date of Acceptance10-Jan-2022
Date of Web Publication5-Apr-2022

Correspondence Address:
Dr. Mikko Pyysalo
City of Tampere, Oral Health Services; Oral and Maxillofacial Unit, Tampere University Hospital; Hemorrhagic Brain Pathology Research Group, University of Tampere
Finland
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/jcvs.jcvs_22_21

Rights and Permissions
  Abstract 


Introduction: R is a programming language that can be used to efficiently solve statistical problems.
Objectives: To demonstrate the method of building functions using R in statistical analysis and storing it for future use.
Materials and Methods: A real world example of comparison of statistical values in aneurysm research with respect to its locations has been used to demonstrate how one might build a function using R.
Results: Accurate results could be obtained and the function can be stored for later use.
Conclusions: As the statistical tools required in research are recurrent, building functions in R and storing them for future use is highly recommended.

Keywords: Microvascular decompression, root entry zone, trigeminal neuralgia


How to cite this article:
Pyysalo M. Statistical corner: Building own functions in R. J Cerebrovasc Sci 2021;9:90-1

How to cite this URL:
Pyysalo M. Statistical corner: Building own functions in R. J Cerebrovasc Sci [serial online] 2021 [cited 2022 May 22];9:90-1. Available from: http://www.jcvs.com/text.asp?2021/9/2/90/342559



Since R is a programming language, it is possible to build its own functions to solve specific problems. If solving a statistical problem requires more than few lines of code, it is very efficient to create a function and store it for future purposes. The other way is to write the same code again and again with a certain risk of typing mistakes. Basic structure of writing a function is:

my_function <- function (arguments) {

function body

}

As an easy example, we will build a function which adds 1 to a given value. We will give it a descriptive name: add_one(). The function requires only one argument, a, adds one to it, stores the new value to variable b and prints the result to the screen.

add_one <- function(a) {

b <- a+1

print (b)

}

Now type add_one(2), press enter and R should return a value 3.

For example, in the field of aneurysm research, one occasionally might need to compare if the specific locations of the aneurysms differ between the groups. To solve this problem, both Fisher's exact test and Chi-squared tests are used. These tests are statistical significance tests used in the analysis of contingency tables. Both Fisher's test and Chi-squared test require a numeric matrix as an input. We will build a function called 'location_p_values'. To be able to understand the structure of the function, one must know the basic structure of looping in R. We will use so-called for-loop. The basic form of a for-loop is:

for (i in 1:10) {

print(i)

}

The loop gives i values from 1 to 10 and prints them to the screen. Next loop prints 'cat', 'dog' and 'mouse' to the screen.

for (i in c(”cat”, “dog”, “mouse”)) {

print(i)

}

The function 'location_p_values' is written:

location_p_values <- function(names, case, control) {

require(MASS)#load MASS package

fisher_p_values <- numeric(0)#create empty vector

chi_p_values <- numeric(0)#create empty vector

for (i in 1:length(case)) {

cases <- c(case[i],sum(case)-case[i])

controls <- c(control[i],sum(control)-control[i])

location_data <- cbind(cases,controls)

location_matrix <- as.matrix(location_data)

fish <- fisher.test(location_matrix)

chi <- chisq.test(location_matrix)

fisher_p_values[i] <- fish$p.value

chi_p_values[i] <- chi$p.value

}

location_data_frame <- cbind.data.frame(names, case, control, fisher_p_values, chi_p_values)

names(location_data_frame) <- c(”location”, “cases”, “controls”, “fisher p-values”, “chi-squared p-values”)

print(location_data_frame)

}

This function requires three arguments: 'names', 'case' and 'control' variables. They must be of same length to make the function work properly, of course. First three lines make sure that MASS -package is installed and loaded and create two empty vectors to which the data will be stored. Inside the for-loop, the first line takes the first value from the variable 'case' and defines how many that type of aneurysms there is, and how many there is not. The result is stored to 'cases' variable. Then, the second line does exactly the same thing to 'control' variable. Then, the result is stored to a dataframe (cbind-command) and converted to numeric matrix for statistical testing. Then, both Fisher's test and Chi-squared tests are performed and P values are stored to fisher_p_values and chi_p_values variables. Then, the loop goes on line by line as many times as the variable length demands. When the loop is ready, the output is printed. The best way to understand how the function works is to test it. You can type:

> locations<-c(”ica”, “mca”, “acoa”, “vba”, “aca”)

> ruptured<-c(1,5,0,1,0)

> unruptured<-c(1,23,4,0,1)

>location_p_values(names=locations, case=ruptured, control=unruptured)

Then, press enter. R should you an output [Table 1] that tells you that the prevalence/incidence of the aneurysms in any given location is not significantly different between the groups. When the data are small, Fisher's test is more reliable than Chi-squared test.
Table 1: Output example

Click here to view



  Loading Required Package: MASS Top


In this short tutorial, the basics of looping and writing a function were introduced. As researchers concentrate on specific field of research, the study designs and statistical problems tend to be similar during the career. That is why it is highly recommended to learn writing functions and store them for future purposes.





 
 
    Tables

  [Table 1]



 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
Abstract
Loading Required...
Article Tables

 Article Access Statistics
    Viewed72    
    Printed0    
    Emailed0    
    PDF Downloaded3    
    Comments [Add]    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]