Showing posts with label Bigdata. Show all posts
Showing posts with label Bigdata. Show all posts

Tuesday, 11 April 2017

Big Data - Kapil Sharma (101)

Big Data consists of five types:

1. Volume

2. Velocity 

3. Variety 

4. Veracity 

5. Value  

More than 90% of big data set created in last 3+ years.

To process this huge data set of unstructured data, big data frameworks comes in picture. 

To retrieve value from processing, computing and analysis data leads to the unique value addition. 




Wednesday, 25 November 2015

Basic Data Science Introduction (C-01) - Kapil Sharma

Vectors: 
The vector is a very important tool in R programming. Through vectors we create matrix and data-framesVectors can have numeric, character and logical values. The function c() is used to create vectors in R programming.

x <- c(2,22,"xyz", -4)


Factors:

They are same like vectors but they have different meaning.
 y <- c(1,2,3,4,5,6,7)
yf<- factor(y)
yf

Lists:

They are vectors, but they consists of different data sets.
a <- c(dog = "pitbull", age = 100, color = "golden", weight = TRUE)

Matrices:

They are vectors with more than one dimensions, consists of rows and columns (ncol,nrow).
They can be rowbind (rbind()) or column bind (cbind())

# Create matrix with 4 elements:

cells <- c(3,5,16,29)
colname <- c("Jun", "Feb")
rowname <- c("Nut", "Orange")
y <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rowname, colname))

              Jun   Feb

Nut         3      5
Orange  16    29

Datasets:

It is same like matrix, but it also consists of numeric and character elements.
Location <- c("Mandi", "Manali")
Distance <- c(200, 307)
df <- data.frame(a,b)
df

Location Distance

Mandi 200
Manali 307


Wednesday, 5 November 2014

What is Hadoop - Kapil Sharma

What is Hadoop?

Hadoop - is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Currently three core components are included with your basic download from the Apache Software Foundation.


HDFS - the Java-based distributed file system that can store all kinds of data without prior organization.

MapReduce – a software programming model for processing large sets of data in parallel.

YARN – a resource management framework for scheduling and handling resource requests from distributed applications.

What is Big Data - Kapil Sharma

BigData: 

Extremely large data sets both structured and un-structured.

Four Vs of big data: volume, velocity, variety, variability and complexity.

Benefits

If analysed than they reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Every day approx 3 exabytes (2.5×1018) of data were created.

Try this cmd on google search: "?intitle:index.of?mp4 Oracle"