Skip to main content

Posts

Showing posts with the label datascience

Blockchain: The New Technology of Trust

T hink about a blockchain as a distributed database that maintains a shared list of records. These records are called blocks, and each encrypted block of code contains the history of every block that came before it with timestamped transaction data down to the second. In effect, you know, chaining those blocks together. Hence blockchain Blockchain is the data structure that allows Bitcoin (BTC) and other up-and-coming cryptocurrencies such as Ether (ETH) to thrive through a combination of decentralized encryption, anonymity, immutability, and global scale . It’s the not-so-secret weapon behind the cryptocurrency’s rise, and to explain how blockchain came to be, we have to begin briefly with the legacy of Bitcoin. Welcome to our Blockchain future In the future, viewers will forego paying subscriptions to platforms and can instead give directly to the content providers they love. Creators will therefore receive a larger share of the pie. By allowing the blockchain to use their computer ...

“Gapminder” Exploratory Data Analysis using R for Data Science

M ain focus is to investigate the dataset Gapminder and interact with it. To illustrate the basic use of EDA in the dplyr,ggplot2 package, I use a “gapminder” datasets. This data is a data.frame created for the purpose of predicting sales volume. Using the dplyr package to perform data transformation and manipulation operations.  Using the ggplot2 package to visually analyze our data. Load Packages #install.packages("gapminder") library(gapminder) library(dplyr) library(ggplot2) The variables are explained as follows: Country — factor with 142 levels Continent — Factor with 5 levels Year — ranges from 1952 to 2007 in increments of 5 years lifeExp — life expectancy at birth, in years pop — population dgoPercap — GDP per capita head(gapminder_unfiltered,5) #Unfiltered data tail(gapminder_unfiltered,5) Display name of Variables : names(gapminder_unfiltered) Data Cleaning : Finding the missing values as we can see this data has no missing values str(gapminder_unfiltered) sum...

RegEx in R for Data Science

The ‘regex’ family of languages and commands is used for manipulating text strings. More specifically, regular expressions are typically used for finding specific patterns of characters and replacing them with others. Finding Regex Matches in String Vectors The grep function takes your regex as the first argument, and the input vector as the second argument. If you pass value=FALSE or omit the value parameter then grep returns a new vector with the indexes of the elements in the input vector that could be (partially) matched by the regular expression. If you pass value=TRUE, then grep returns a vector with copies of the actual elements in the input vector that could be (partially) matched. > grep("a+", c("abc", "def", "cba a", "aa"), perl=TRUE, value=FALSE) [1] 1 3 4 > grep("a+", c("abc", "def", "cba a", "aa"), perl=TRUE, value=TRUE) [1] "abc" "cba a"...

Data Cleaning in R for Data Science

Data Cleaning in R for Data Science : Removing duplicate values Removing null values Changing column names to readable, understandable, formatted names Removing commas from numeric values i.e. (1,000,657 to 1000657) Converting data types into their appropriate types for analysis The Experiment : The experiment conducted here is retrieved from UCI Machine Learning Repository where a group of 30 volunteers (age bracket of 19–48 years) performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a Samsung Galaxy S smartphone. The data collected from the embedded accelerometers was divided into testing and trained data. Step 1: Retrieving Data from URL The first step required is to obtain the data. Often, to avoid the headache of manually downloading thousands of files, they are downloaded using small code snippets. Since this was a zipped folder . Data Reference : http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recog...