apache pig language introduction | Angular | ASP.NET Tutorials

For Consultation : +91 9887575540

Stay Connected :

PIG Introduction

  • Big is an Apache open source project
  • pig works on MapReduce and hdfs and the script that we will read write to perform data analysis is called pig Latin.
  • PIG is capable to read data from local file system and hdfs and able to write output data on hdfs or local file system.
  • It is data flow language which can eat anything eat anything means that pig is capable to work on structured data, semi structured data, and unstructured data.
  • Pig Developed by the scientist of Yahoo.
  • Pig can ingest data from files, streams or other sources using the User Defined Functions(UDF).
  • Once it has the data it can perform select, iteration, and other transforms the data.

How to run PIG

grant is a interactive shell for users to write pig Latin script.

pig -x local

It executes in a single JVM and is used for development experimenting and prototyping.Local mode works on local file system.
mapreduce – The MapReduce mode is also known as Hadoop Mode. In this Pig renders Pig Latin into MapReduce jobs and executes them on the cluster. It can be executed against semi-distributed or fully distributed hadoop installation.

Type pig -x mapreduce or just write PIG to enter the shell .

Data Types of PIG

1. Scalar Data Types : int ,long , float, double , chararray, bytearray
2. Complex Data Types.: Maps , Tuples and Bags
Maps:A map in Pig is a chararray to data element mapping, where that element can be any
Pig type, including a complex type.
Ex [‘name’#’bob’,’age’#55]

Tuple: A tuple is a fixed-length, ordered collection of Pig data elements. Tuples are divided
into fields, with each field containing one data element.
For example, (‘bob’, 55) describes a tuple constant with two fields.
Bag : A bag is an unordered collection of tuples. Because it has no order, it is not possible to
reference tuples in a bag by position.

For example, {(‘bob’, 55), (‘sally’, 52), (‘john’, 25)} constructs a bag with
three tuples, each with two fields.

How to declare PIG SCHEMAS

Two Ways :
1. dividends = load ‘NYSE_dividends’ as (exchange:chararray, symbol:chararray, date:chararray, dividend:float);
2. dividends = load ‘NYSE_dividends’ as (exchange, symbol, date, dividend);