Aggregation Method


Aggregator stage is a transforming stage in datastage is used for making grouping and synopsis operations.By Default Aggregator stage will execute in parallel mode in parallel occupations.
Note : In a Parallel domain ,the way that we divide information before making group and synopsis(summary) will influence the results. If you divide information by utilizing round-robin technique and afterward records with same key qualities will distribute crosswise over distinctive partitions and that will give in right results.
Aggregator stage has two diverse aggregation Methods.
1) Hash : Use hash mode for a moderately little number of gatherings; eventually, less than  1000 groups after every megabyte of memory.
2) Sort : Sortmode obliges the input information set to have been parcel sorted with the greater part of the gathering keys determined as hashing and sorting keys.Unlike the Hash Aggregator, the Sort Aggregator requires presorted information, yet just keeps up the calculations for the latest group in memory.
Aggregation Data Type:
Eventually, aggregator stage computation output section is a double information type and if you need decimal output then include below given property as demonstrated.

If you are only utilizing single key section for grouping keys then there is no compelling reason to sort or hash parcel the approaching information.
Aggregator and filter stage with example
If we have a data as below given :

table_a
dno,name
10,siva
10,ram
10,sam
20,tom
30,emy
20,tiny
40,remo

& we have to get the same different times records into the one target.
Then, one records not repeated with respected (regarded) to dno need to go to another target.

Take Job design as

Read and load the
information in sequential file.

In Aggregator stage
choose group =dno

Aggregator type = count rows

Count output column =dno_cpunt( user defined )

In output Drag and Drop the columns required.Than click ok

In Filter Stage

----- At first where clause dno_count>1
-----Output link =0
-----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets.

Aggregator stage to find number of people group-wise

We can use Aggregator stage to find number of people each in each department.

For example, if we have the data as below

e_id,e_name,dept_no
1,sam,10
2,tom,20
3,pinky,10
4,lin,20
5,jim,10
6,emy,30
7,pom,10
8,jem,20
9,vin,30
10,den,20



Take Job Design as below

Seq.-------Agg.Stage--------Seq.File



Read and load the data in source file.

Go to Aggregator Stage and Select Group as Dept_No

and Aggregator type = Count Rows

Count Output Column = Count ( This is User Determined)

Click Ok ( Give File name at the target as your wish )

Compile and Run Job
Aggregator stage with real time scenario example
Aggregator stage works on groups.
It is used for the calculations and counting.
It supports 1 Input and 1 Outout

Example for Aggregator stage

Input Table to Read

e_id, e_name, e_job,e_sal,deptno

100,sam,clerck,2000,10
200,tom,salesman,1200,20
300,lin,driver,1600,20
400,tim,manager,2500,10
500,zim,pa,2200,10
600,eli,clerck,2300,20



Now, our requirement is to discover the max. salary from every department number.
A
s per sample information, we only have two dept. here.

Take Sequential File
for reading the info and take Aggregator for the calculations.


& Take sequential file to load in to the target.

Tha
t’s we can take like this

Seq.File--------Aggregator-----------Seq.File



Read data in Seq
uential File


& in Aggregator Stage ---In Properties---- Select Group =Dept. No.

Choose e_sal in Column for calculations

that is due to calculations of maximum salary based on entire dept. Group.


Choose output file name in 2nd sequential file.

Here compile & run.

It will work fine.



Acquiring Datastage Tutorial For Complete Knowledge On The Tool


With the tremendous advancement in technology, Datastage has become a common tool for designing, developing and running applications in a data warehouse. In fact, it can also be considered as an ETL Tool that is used to extract, transform and load the data into data warehousing. The data is then used to generate reports and help in decision making processes. They also help in the processes of data cleansing, profiling and extracting. Hence, the demand for Datastage or IBM Information server is huge. Acquiring knowledge on the tool will definitely help you to a great extent.
Well, you will be happy to know that we are one of the leading companies offering Datastage Tutorial. We offer complete guidance to the candidates interested in learning different processes associated with the same. We start right from the basic concepts of Datastage so that candidates acquire knowledge on the usage of the tool. Some of the common concepts that we teach include compiler, data modelling, data warehousing and the different versions of Datastage. In addition to that, we also teach about the errors, the features, the partitioning concepts and project concepts. With us, you will learn about the roles and responsibilities of a developer working with this particular tool and the special characters surrounding the tool.
Our Datastage Tutorial also includes examples so that it is easy for developers to understand the concept and the exact processes of its applications. At different stages of our course, we will offer Datastage examples of different phases. Right from the filter stage to the funnel stage, generators stage, merge stage and lookup stage, we will provide complete examples making it easier for our learners to get hands on experience. Datastage can extract data from any source and load it into any target. It also runs on any platform and data can be easily distributed across the nodes with different partition techniques.
We also provide our candidates with hosts of interview questions on this concept so that they are thoroughly prepared and confident during any interview. Our experienced and qualified professionals will take special care in guiding you and making sure that you are thoroughly clear on the subject. The Datastage interviewquestions that we prepare are some of the most updated and common questions asked by different companies while recruiting Datastage candidates. Therefore, you will reap benefits with the questions that you come across in our tutorial and can boost your self-confidence in using the tool.

These are many more things are covered in our tutorial of Datastage. You are certainly missing out something important and crucial by not attending our course. Our tutorial is authentic and many developers have benefited a lot from the courses that we offer. Unless, you try out our course, you will not get an idea of what we have in store for you. Therefore, it is high time to subscribe to our website and get constant updates on our tutorials. You can also get started with us to advance your knowledge on Datastage and progress your role as a software developer.   

All about Datastage Director

Datastage Director is a client component that validates, runs , schedules and monitors jobs that are run by the Datastage Server. Datastage Director is accessed through a separate icon like DS Designer . It can also access  from with DS Designer or DS manager.  by selecting jobs can be " Reset' and Run from DS Director .

Select the Reset button or the RUN. The current status of a DS Job can be monitored from the status view or can see the Job logs of the Director . The compiled Jobs can be scheduled . These options are activated only when a job is selected.

In Datastage Director,

Run/Validate a colbiled Job
Stop a Running  Job
Reset an aborted Job
Schedule Jobs/Batches
Create a New Batch
View Staus / Monitor Jobs
View / Purge Job Log Entries
Clean Up Resources
Message Handling


Subscribe in our website Datastage Tutorial for all updates on Datastage to your email.  
Dont forget to  Follow our Datastage Facebook Channel and Datastage Twitter Channel.

Sequencer Palette Stages Can be grouped into four Categories

Job Sequence Stages:

1) Run :
a) Job Activity: Run a Job( ASequence Job Can ve called as well )
b) Executive Command : Run a System Command ( OS Command or a Script)
c) Notification Activity : Send an email ( SMPT Server needed)

2) Flow Control
a) Sequencer: Make a any /all Decision.
b) Wait for File: Go When file exists /doesn't exists.
c) Start loop/End loops : Construct Loops
d) Nested Condition : Implement complex control structures.

3)  Error Handling:
a) Exception Handler.
b) Terminator: Send Stop Signal to the calling sequence

4) User Variables and Routine Activity
a) User Variables can be created to use downstream .
b) Routines can be called using Routine Activity.


Datastage Tutorial is the best place to learn datastage Online. You can also  Buy Datastage Material   here to get all  material at one place.
Datastage Online Training is the  one and only best place to learn all about Datastage Stages  with Datastage Examples

Role of merge stage in Datastage


The Merge stage is just a processing stage. It can have any number of data connections or input links, a solitary output link and the same number of reject links as there are update input links.(as like to DS documentation)
Merge stage consolidates a master dataset with one or more overhaul datasets in light of the key columns. The output record contains all the segments from expert record in addition to any extra sections from every upgrade record that are needed.
An expert record and upgrade record will be blended just if both have same key segment values. The data sets input to the Merge stage should be key partitioned and sorted. This makes sure those columns with the same key segment qualities are placed in the same parcel and will be handled by the same hub. It likewise minimizes memory prerequisites in light of the fact that less lines need to be in memory at any one time.
As a component of preprocessing your information for the Merge stage, you ought to additionally expel copy records from the expert information set. If you have more than one upgrade data set, you must expel copy records from the redesign information sets too.
Not at all like Join stages and Lookup stages, the Merge stage permits you to determine a few reject joins. You can route update link rows that neglect to match an expert column down a reject interface that is particular for that link. You should have the same number of rejected links as you have upgrade joins. The Link Ordering tab on the Stage page gives you a chance to indicate that update links send rejected columns to those rejected links. You can likewise indicate whether to drop unmatched expert columns or output them on the output data link.
Example :
Master dataset:

CUSTOMER_ID
CUSTOMER_NAME
1
UMA
2
POOJITHA

Update dataset1



CUSTOMER_ID
CITY
ZIP_CODE
SEX
1
CYPRESS
90630
M
2
CYPRESS
90630
F

Output:




CUSTOMER_ID
CUSTOMER_NAME
CITY
ZIP_CODE
SEX
1
UMA
CYPRESS
90630
M
2
POOJITHA
CYPRESS
90630
F
Merge stage configuration steps :
Unmatched Masters Mode: Keep implies that unmatched columns (those with no upgrades) from the expert links are output; Drop implies that unmatched lines are dropped.
Caution on reject Updates: True to create a notice when awful records from any update links are rejected.
Caution on unmatched masters:True to create a notice when there are unmatched columns from the expert connection.
Partitioning: Hash on both expert input and update input include as demonstrated as follows
Aggregate and run the job:
Situation 1
Expel a record from the updates 1 and check the output:
Check for the datastage cautioning in the job log as we have chosen. Warn on unmatched experts = TRUE
stg_merge,0: Master record (0) has not updates.
stg_merge,1: Update record (1) of data set 1 is dropped; no masters are cleared out.
Situations 2
Drop unmatched expert record and catch reject records from updates 1.
Situation 3
Insert a copy record with same client id in the expert dataset and check for the outcomes.
Take a gander at the output and it is clear that merge stage consequently dropped the copy record from expert dataset.
Situation 4:
Added new update dataset 2 that contains taking after data.

Update Dataset2

CUSTOMER_ID
CITIZENSHIP
1
INDIAN
2
AMERICAN
Still, we have copy row in the expert dataset. If you accumulate the job with above configuration, you will get gathering blunder like below. If you view ate the above figure you can see 2 lines in the output because we have a coordinating column for the customer_id = 2 in the updates 2 .
Situation 5
 Add a copy line for customer_id=1 in updates dataset. Now, we have copy record both in expert dataset and updates1. Run the job and check the outcomes and warnings in the occupation log.
No change in the outcomes and merge stage naturally dropped the copy column.
Situation 6
 Modify a copy line for customer_id=1 in updates1 dataset with zipcode as 90630 rather than 90620.
Run the employment and check output results.
We can ran the same job numerous times and discover the merge stage is taking first record nearing as information from the updates 1 and dropping the following records with same client id.

This post covered the vast majority of the merge situation.

How lookup stage works in Datastage ?



Like Join and Merge stage, Lookup stage has various multiple input links, one is basic and others are reference, as per that lookup operation takes place. But it does not have condition like Merge stage i.e. 'Reject Links' ought to be equivalent to overhaul data interfaces likewise it not oblige information on any of the input links to be sorted. Lookup stage gives four conditions rely upon that in coming days of output data depends.
We will see these conditions in 'Step 4'.
Presently, let’s attempt to actualize or implement Lookup stage with the assistance of below given tables.
Table 1
ID
First Name
Last Name
Location
Network ID
EmailID
1
Jach
Simmons
Chicago
JS524
Letsc@gmail.com
2
Shumas
Jane
LA
Sj145
Jaene@ymail.com
3
Jonty
Waughn
Sydney
JW927
JontyW@sdbh.com
4
Suhana
Safar
Maxico
SS99
Sas@gmail.com
Table 2
ID
Dept
Dept Head
1
Electronics
Paul
3
CS
Jack
4
TS
Summur
         5
IT
Sean
Table 3
ID
Training Cent
1
CKG
2
AMD
3
WC
Step 1 : Design a job structure like underneath.
Consider Employee table as Primary connection or link as demonstrated. Contingent on every or each record in Primary connection, Lookup Stage performs turn upward operation on Reference Link according with key section.

Consider Employee's department information that is table second and Employee's training focus that is Table 3 as data on two reference links.

Step 2 : Now, we are going to Lookup Stage (Named as lkp_emp_det in configuration). Twofold click on Lookup Stage. Taking after window will pop up. Left sheet is for all inputs and right sheet is for output. Initially, link detail table is for Primary link. Second and third are for reference links.

Request of these reference links can be changed by utilizing this image on Title Bar as shown.
Step 3 : In left sheet, guide Key segment (here 'ID') by simply dragging it to the particular key section in reference joins. Guide all staying obliged segment to right sheet as demonstrated.

Step 4 : One of the most vital step is to situated gaze upward conditions that we can do by utilizing second choice on Title bar. Simply click on it, taking after window will pop up.
There is a rundown of reference connections in 'Connection Name' section. In "Condition" section, we can give conditions for every reference link. Whether this condition won't meet then what will happen to that information is chosen by 'Condition Not Met' section and if lookup comes up short it is chosen by 'Lookup Failure' segment.
Proceed with : Data will be sent to the Output link.
Reject : Data will be sent to the Reject link.
Drop : Data will neither go to Output link nor to Reject link.
Fail : Job will falls fail.
In this case, how about we first strive for without condition in "Condition" segment and "Proceed with" and "Reject" in different sections.
Step 5 : Compile and run the JOB.
How about we see what the output is :
Output : Stream link
It's demonstrating two records. As we have given 'Lookup Failure' condition as 'Reject', those records from essential connection that are not coordinated with reference join information are gathered in Reject Link "rjct_primary_rec" as demonstrated as follows.
Step 6 : Let's attempt to design for "Condition" section in 'Lookup Stage Conditions' sheet.
Recently put condition as ID=3 and "Reject" under 'Condition Not Met' as demonstrated as follows.
But ID=3 all records will get dismisses and get put away in 'Reject join'. Here information for ID=2 get rejected and we will get output for Stream interface as indicated.
Output for 'Reject Link'
Note :  Reject Link shows rejected record from basic info link just.
Practice for "Drop" and "Come up short" or fail condition.

Conclusion


DataStagehas constantly performed joins proficiently when there are accurate key fields that match utilizing the lookup, join or union stage. Range lookups are all the more difficult as its a less effective approach to join whether you are destroying it an ETL work or on a database. You can do an extent lookup in DataStage 7 utilizing a lookup stage and a channel stage, you can do it utilizing a meager lookup and you can do it by stacking both tables into a database organizing territory and going along with them in SQL. This exercise demonstrates to destroy it a solitary Lookup stage giving a much less complex outline.