Datastage Tutorial is the best place to learn Datastage Online. Datastage is a Etl Tool.We will also Give Datastage Online Training.

What steps should be taken to improve Datastage jobs?

In order to improve performance of the Datastage jobs.
a)We have to first establish the baselines.
2)We should not use only one flow for performance testing.
 3)We should work in increment.
Then, we should evaluate data skews. Then we should isolate and solve the problems, one by one. After that, we should distribute the file systems to remove bottlenecks, if any. Also, we should not include RDBMS in start of testing phase. Last but not the least, we should understand and assess the available tuning knobs.

Source is a flat file and has 200 records . These have to split across 4 outputs equally . 50 records in each .

The total number of records in the source may vary everyday ,according to the count records are to split equally at 4 outputs.

Keep four files as output.Simply give modulus(col_name,4)=0 constraint for first file ,modulus(col_name,4)=1 second file and modulus(col_name,4)=2 for third file and finally modulus(col_name,4)=3

You can write your Opinion below in the Comment  to help Others

Finding jobs as freshman year students can be difficult as there is no prior experience when it comes to job hunting.

Firstly, students must always research the company not only when they have received the opportunity to work there, but even before they apply to the company. While some start ups provide for a good work experience and offer a reasonable pay, some of them may also be a hoax to indulge in fraudulent methods and dupe young interns into working hard for the company and later not paying them the promised amount or even acknowledging the fact that they worked there. However, the most important reason as to why one must research about the company and its vision is to know exactly why you want to join the company. When the interviewer asks you as to why you have chosen this particular company and why their vision is something you’d like to be a part of, you need to have a substantial answer. Also, it makes a negative impact on the interviewer when they have to explain to you what their company is all about and what exactly their goals are.
Once you have researched about the company, the next step is to document what work have you done throughout your college years and what professional experience have you had before. Also, including special industrial projects into this document helps in enhancing the application as it shows that you, as a candidate have a fair idea of how industries work and how you can apply your technical or subject knowledge into more practical ways. Participating in open source projects makes the application stand out and makes you look like a smart candidate.
Always keep the resume concise and to the point. Exaggerating certain points on your resume is not always the best thing to do. The interviewer has to skim through a considerable amount of resumes in a day and having the task of reading paragraphs from your lengthy resume will only put you into a negative picture. Sticking to work you have actually done in a short and concise manner with only a one line description at the most about the work should be sufficient. Being crisp about achievements and accolades is always the best thing to do. Also, remember to never embellish your achievements from elsewhere and always be truthful about your work experience and awards. It could be very embarrassing if your interviewer catches a certain discrepancy in your resume.
Maintaining a mature online presence is very essential and vital. These days, in the wake of modern technology, companies often to a background check on their candidates in order to get a clearer picture of what and who they are. LinkedIn, Twitter, Facebook are some of the social domains you might want to take care of and maintain a more mature and professional image. Of course Facebook is a more casual domain and lets you connect with friends and family. However twitter and LinkedIn are often looked into by companies and its best to maintain a professional demeanour on such public domains.

Tips & Tricks for debugging a DataStage job

Tips & Tricks for debugging a DataStage Job

The information here talks about DataStage debugging techniques. This can be applied to any job which is not producing proper output data or to a job that is aborting or generating warnings
Use the Data Set Management utility, which is available in the Tools menu of the DataStage Designer or the DataStage Manager, to examine the schema, look at row counts, and delete a Parallel Data Set. You can also view the data itself.
Check the DataStage job log for warnings or abort messages. These may indicate an underlying logic problem or unexpected data type conversion. Check all the messages. The PX jobs almost all the times, generate a lot of warnings in addition to the problem area.
Run the job with the message handling (both job level and project level) disabled to find out if there are any warning that are unnecessarily converted to information messages or dropped from logs.
Enable the APT_DUMP_SCORE using which you would be able see how different stages are combined. Some errors/logs mentioned the error is in APT_CombinedOperatorController stages. The stages that form the part of the APT_CombinedOperatorController can be found using the dump score created after enabling this env variable.
This environment variable causes the DataStage to add one log entry which tells how stages are combined in operators and what virtual datasets are used. It also tells how the operators are partitioned and how many no. of partitions are created.
One can also enable APT_RECORD_COUNTS environment variables. Also enable OSH_PRINT_SCHEMAS to ensure that a runtime schema of a job matches the design-time schema that was expected.
Sometimes the underlying data contains the special characters (like null characters) in database or files and this can also cause the trouble in the execution. If the data is in table or dataset, then export it to a sequential file (using DS job). Then use the command “cat –tev” or “od –xc” to find out the special characters.
Once can also use “wc -lc filename”, displays the number of lines and characters in the specified ASCII text file. Sometime this is also useful.
Modular approach: If the job is very bulky with many stages in it and you are unable to locate the error, the one option is to go for modular approach. In this approach, one has to do the execution step by step. E.g. If a job has 10 stages, then create a copy of the job. Just keep say first 3 stages and run the job. Check the result and if the result is fine, then add some more stages (may be one or two) and again run the job. This has to be done till one is unable to locate the error.
Partitioned approach with data: This approach is very useful if the job is running fine for some set of data and failing for other set of data, or failing for large no. of rows. In this approach, one has to run the jobs on selected no .of rows and/or partitions using the DataStage @INROWNUM (and @PARTITIONNUM in Px). E.g. a job when run with 10K rows works fine and is failing with 1M rows. Now one can use @INROWNUM and run the job for say first 0.25 million rows. If the first 0.25 million are fine, then from 0.26 million to 0.5 million and so on.
Please note, if the job parallel job then one also has to consider the no. of partitions in the job.
Other option in such case is – run the job only one node (may be by setting using APT_EXECUTION_MODE to sequential or using the config file with one node.
Execution mode: Sometime if the partitions are confusing, then one can run the job in sequential mode. There are two ways to achieve this:
Use the environment variable APT_EXECUTION_MODE and set it to sequential mode.
Use a configuration file with only one node.
A parallel Job fails and error do not tell which row it has failed for: In this case, if this job is simple we should try to build the server job and run it. The server jobs can report the errors along with the rows which are in error. This is very useful in case when DB errors like primary/unique violation or any other DB error is reported by PX job.
Sometimes when dealing when DB and if the rows are not getting loaded as expected, adding the reject links to the DB stages can help us locating the rows with issues.
In a big job, adding some intermediate datastes/peek stages to find out the data values at certain levels can help. E.g. if there 10 stages and after that it is going to dataset. Now there may be different operations done at different stages. After 2/3 stages, add peek stages or send data to datasets using copy stages. Check the values after at these intermediate points and see if they can shed some light on the issue.

Ask Questions


Email *

Message *

Popular Posts