Skip to main content

Posts

Showing posts with the label regex

RegEx in R for Data Science

The ‘regex’ family of languages and commands is used for manipulating text strings. More specifically, regular expressions are typically used for finding specific patterns of characters and replacing them with others. Finding Regex Matches in String Vectors The grep function takes your regex as the first argument, and the input vector as the second argument. If you pass value=FALSE or omit the value parameter then grep returns a new vector with the indexes of the elements in the input vector that could be (partially) matched by the regular expression. If you pass value=TRUE, then grep returns a vector with copies of the actual elements in the input vector that could be (partially) matched. > grep("a+", c("abc", "def", "cba a", "aa"), perl=TRUE, value=FALSE) [1] 1 3 4 > grep("a+", c("abc", "def", "cba a", "aa"), perl=TRUE, value=TRUE) [1] "abc" "cba a"

Data Cleaning in R for Data Science

Data Cleaning in R for Data Science : Removing duplicate values Removing null values Changing column names to readable, understandable, formatted names Removing commas from numeric values i.e. (1,000,657 to 1000657) Converting data types into their appropriate types for analysis The Experiment : The experiment conducted here is retrieved from UCI Machine Learning Repository where a group of 30 volunteers (age bracket of 19–48 years) performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a Samsung Galaxy S smartphone. The data collected from the embedded accelerometers was divided into testing and trained data. Step 1: Retrieving Data from URL The first step required is to obtain the data. Often, to avoid the headache of manually downloading thousands of files, they are downloaded using small code snippets. Since this was a zipped folder . Data Reference : http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartp