BigData-Mines: Howto

Tuesday, 11 April 2017

Big Data - Kapil Sharma (101)

Big Data consists of five types:

1. Volume

2. Velocity

3. Variety

4. Veracity

5. Value

More than 90% of big data set created in last 3+ years.

To process this huge data set of unstructured data, big data frameworks comes in picture.

To retrieve value from processing, computing and analysis data leads to the unique value addition.

Wednesday, 25 November 2015

Basic Data Science Introduction (C-01) - Kapil Sharma

Vectors:
The vector is a very important tool in R programming. Through vectors we create matrix and data-frames. Vectors can have numeric, character and logical values. The function c() is used to create vectors in R programming.

x <- c(2,22,"xyz", -4)

Factors:
They are same like vectors but they have different meaning.
y <- c(1,2,3,4,5,6,7)
yf<- factor(y)
yf

Lists:
They are vectors, but they consists of different data sets.
a <- c(dog = "pitbull", age = 100, color = "golden", weight = TRUE)

Matrices:
They are vectors with more than one dimensions, consists of rows and columns (ncol,nrow).
They can be rowbind (rbind()) or column bind (cbind())

# Create matrix with 4 elements:
cells <- c(3,5,16,29)
colname <- c("Jun", "Feb")
rowname <- c("Nut", "Orange")
y <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rowname, colname))

Jun Feb
Nut 3 5
Orange 16 29

Datasets:
It is same like matrix, but it also consists of numeric and character elements.
Location <- c("Mandi", "Manali")
Distance <- c(200, 307)
df <- data.frame(a,b)
df

Location Distance
Mandi 200
Manali 307

Wednesday, 5 November 2014

What is Hadoop - Kapil Sharma

What is Hadoop?

Hadoop - is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Currently three core components are included with your basic download from the Apache Software Foundation.

HDFS - the Java-based distributed file system that can store all kinds of data without prior organization.

MapReduce – a software programming model for processing large sets of data in parallel.

YARN – a resource management framework for scheduling and handling resource requests from distributed applications.

Hadoop link: http://hadoop.apache.org/releases.html#Download

BigData-Mines

Tuesday, 11 April 2017

Big Data - Kapil Sharma (101)

Thursday, 23 February 2017

Map/Reduce Framework Basic - Kapil Sharma

Wednesday, 25 November 2015

Basic Data Science Introduction (C-01) - Kapil Sharma

Sunday, 16 August 2015

R Programming Tutorial, Vectors, R-06 - Kapil Sharma

Friday, 14 August 2015

R Programming Tutorial, Data and its types, R-05 - Kapil Sharma

Thursday, 13 August 2015

R Programming Tutorial, Variables In RStudio, R-04 - Kapil Sharma

R Programming Tutorial, Simple Programs In RStudio, R-03 - Kapil Sharma

Wednesday, 5 November 2014

What is Hadoop - Kapil Sharma

What is Hadoop?