I am new to Ab Initio, I am sorry for such question but request you to please help me. Thanks in advance. Try to avoid using the. No Account? Sign up.
By signing in, you agree to our Terms of Use and Privacy Policy. Already have an account? Sign in. By signing up, you agree to our Terms of Use and Privacy Policy. Enter the email address associated with your account. We'll send a magic link to your inbox. Email Address. All Sign in options. Enter a Email Address. Choose your interests Get the latest news, expert insights and market research, sent straight to your inbox. Newsletter Topics Select minimum 1 topic. Data Management. Sign in.
By signing up, you agree to our Terms of Use and Privacy Policy. Enter the email address associated with your account. We'll send a magic link to your inbox. Email Address. All Sign in options. Enter a Email Address. Choose your interests Get the latest news, expert insights and market research, sent straight to your inbox. Newsletter Topics Select minimum 1 topic. Data Management.
Tags: Ab Initio. Hi, Whats the. What is the difference between the flows of 3 parallelisms? Parallelism's are of 3 types: 1. Component Parallelism: All program components runnings simultaneously on different data sets. Pipeline Parallelism: All program components runnings simultaneously on same data sets. Data Parallelism: Distributes data records into multiple locations using partition components.
How can I calculate the total memory requirement of a graph? You can roughly calculate memory requirement as: 1. Add size of lookup files used in phase if multiple components use same lookup only count it once 3. Multiply by degree of parallelism. Add up all components in a phase; that is how much memory is used in that phase. Is there any inbuilt function available for that? Scan is really the most simple way to achieve this. Let's say you want to get intermediate results by date.
So store your results in a temp vector which will need to be initialized to be the size of your largest group. As long as this is all done in the "rollup" transformation and not the "finalize" transformation it will run the "initialize" portion before it moves to the next ID.
I have done it this way but the Scan is easier. Ab Initio documentation does not explain this technique in detail but it can be done. That will solve the purpose or Other then scan we can use rollup to do the cumulative summary. Or Use in built componenet in Abinitio.. What will happen, what will be the output.?
If there is no key used in the sort component while using the dedup sort the output depends on the keep parameter. Can we process 1 GB data 1 million records by using Lookup? I think it is not adviseable to use a 1GB lookup file it will definitely effect the parallel processing of other applications and affect the performance. I would prefer to use the MFS lookup file and not serial lookup file in this case.
In my sandbox i am having 10 graphs, i checked-in those graphs into EME. Again i checked-out the graph and i do the modifications, i found out the modifications was wrong. How do I create subgraphs in Ab Initio? What is a sandbox? Sandbox is a directory structure of which each directory level is assigned a variable name, is used to manage check-in and checkout of repository based objects such as graphs.
Within EME for the same project an identical structure will exist. The above-mentioned structure will exist under the os eg unix , for instance for the project called fin, and is usually name of the top-level directory. In EME, a similar structure will exist for the project: fin. When you checkout or check-in a whole project or an object belonging to a project, the information is exchanged between these two structures.
For instance, if you checkout a dml called fin. Once you've created that, as shown above, fin. I have a job that will do the following: ftps files from remote server; reformat data in those files and updates the database; deletes the temporary files.
How do we trap errors generated by Ab Initio when an ftp fails? AbInitio has very good restartability and recovery features built into it. In Your situation you can do the tasks you mentioned in one graph with phase breaks. Same thing if it fails in Phase 2. Coming back to error trapping each component has reject, error, log ports, reject captures rejected records, error captures corresponding error and log captures the execution statistics of the component. What is Ad hoc multifile?
How is it used? Here is a description of Ad hoc multifile: Ad hoc multifiles treat several serial files having the same record format as a single graph component.
Frequently, the input of a graph consists of a set of serial files, all of which have to be processed as a unit. An Ad hoc multifile is a multifile created 'on the fly' out of a set of serial files, without needing to define a multifile system to contain it. This enables you to represent the needed set of serial files with a single input file component in the graph. Moreover, the set of files used by the component can be determined at runtime. This lets the user customize which set of files the graph uses as input without having to change the graph itself, even after it goes into production.
Ad hoc multifiles can be used as output, intermediate, and lookup files as well as input files. The simplest way to define an Ad hoc multifile is to list the files explicitly as follows: 1. Insert an input file component in your graph. Open the properties dialog. Select Description tab. Select Partitions in the Data Location of the Description tab 4. Click Edit to open the Define multifile Partitions dialog box. Click New and enter the first file name. Click New again and enter the second file name and so on.
Click OK. If you have added 'n' files, then the input file now acts something like a file in a n-way multifile system, whose data partitions are the n files you listed. It is possible for components to run in the layout of the input file component. There are other ways than listing the input files explicitly in an Ad hoc multifile. Listing files using wildcards - If the input file names have a common pattern then you can use a wild card for all the files.
All the files that are found at the runtime matching the wild card pattern will be taken for the Ad hoc multifile. Listing files in a variable. You can create a runtime parameter for the graph and inside the parameter you can list all the files separated by spaces.
Listing files using a command - E. This method gives maximum flexibility in choosing the input files, since you can use complex commands also that involves owner of file or date time stamp. What is the difference between Replicate and Broadcast?
Broadcast and Replicate are similar components but generally Replicate is used to increase Component Parallelism, emitting multiple straight flows to seperate pipelines. Broadcast is used to increase data parallelism by feeding records to fan-out or all-to-all flows. Or Replicate is old component when compared to broadcast. You can use Broadcast as join component, where as Replicate you can't use as join.
The compoment, Broadcast, is writing data to all partitions of Input file1, creating an implicit fan out flow. Or The short answer is that the Replicate copies a flow while a Broadcast multiplies it. Broadcast is a partitioner where Replicate is a simple flow-copy mechanism.
You won't see any difference in the two until you start using data-parallel, then it will go south rather quickly. Here's an experiment: Use a simple serial input file, followed by a broadcast, then a 4-way multifile output file component. If you run the graph with say, records from the input file, it will create records in the output file - records for each flow partition encountered.
If you had used a Replicate, it would have read and written records. Hi Just went through 8 ab initio interviews and some of the tough questions were as follows. What is the function you would use to transfer a string into a decimal.? How many parallelisms in ab initio and a definition of the three. What is the difference between db config and a cfg file? Have you eveer encountered an error called depth not equal this apparently occurs when you extensively create graphs How do you truncate a table How do you improve the performance of a graph?
Whats the difference between partitioning with key and round robin? Have you worked with packages? How do you add default rules in transformer?
What is a ramp limit Have you used rollup component How many components in your most complicated graph? Do you know what a local lookup is? Latest Features in Ab Initio - 2. Now if we enable this feature by changing the script generation method to Dynamic in Run Settings we will be able to run a graph without deploying it through GDE.
From now onwards we will execute the mp file only; there is no need to have the ksh. In production server once we run the mp file using air sandbox run command on the fly it generates a reduced script, which contains the commands to set up the host environment. You can check the mp file of dynamic script generation enabled graph.
It is an editable text file. Now question - Does it improve the performance? Yes, in most of the cases it will bring a significant performance boost over the traditional approach of execution. How it works Advantages : 1.
As a result number of processes is reduced when a graph executes. Every process has overheads of creation of new process, scheduling, memory consumption etc. These overheads will vary from OS to OS. Another major benefit of component folding is the reduction of interpretation time for the DML between processes. Because it will end up with multitool folded processes communicating with other multitool or unitool. Apart from that increase in number of processes results higher interprocess communication.
Data movement between two or more processes will not only consume time but memory too. So it is worth enabling Component folding in a CFG. Disadvantages of Component Folding: 1. Pipeline Parallelism: As component folding folds different component in a single process it will hurt the pipeline parallelism of Ab Initio.
But now these two components are folded together so there is no chance of parallel execution. So if we combine 4 different components to a single process by component folding OS will allow only 4 GB of address space for all 4 instead of 4X4 total 16 GB of spaces.
So we should avert component folding components where memory use is very high as in-memory Rollup, Join, and Reformat with lookup. Some components like Sort, in-memory Join causes internal buffering of data. Combing them in a single process will result writing to disk Higher IO. Excluding any component from Component Folding: I know sometime you would wish to prevent components to be folded to allow pipeline parallelism or to access more address space.
Then you need to exclude some components from being folded. Everything has its cost. So it is always worth benchmarking before taking a decision. Prevent and allow component folding for your components of the graph, tune it for the highest performance. The folded components are displayed as multitool process in CPU tracking information. The CPU time for a folded component is shown twice one for the component itself once as a multitool component.
It provides high flexibility in terms of interpretation. This approach is much faster than traditional shell scripting. It is the way to move forward to a much flexible and robust technique of designing. With the use of it we can abolish the old shell scripting as script-end and script-start are already beaten enough to death since last few years. You can use PDL interpretation for condition of a component. I would recommend looking at the metaprograming section for starters.
Then play with the parameters editor. Suppose in a graph we have a conditional component which runs based on existence of a file called emp. Ensure your host run settings are checked for dynamic script generation, and read the 2. Unix: 1. I hv a file as a ab abc abcd abcde.. I m in a subdirectoery. I wnt list the all files which are in the previous directory with out going to that directory. A I said ls —lrt complete path. How many types.
Hv u heared abt type3, type4. Hv u developed scd2? If so how? I hv two files. They are yesterday file and today file. Today file contains records and of them are from yesterday file. And yesterdays file contains records only. How do u filter the 50 records with out using join? I hv records in a multifile. If I use Replicate ang Broadcast the result is same or different.
If different, how many records the op files contains in both cases? How do u stripout header and trailers if there are no indicators. Hv u used psets or PDL or. Hv u heared by conditional components?
I hv given a condition in a component in phase 0 which is checked in the component that is in phase1. Is it possible to run the component of phase 1? How do u count no. Which component do u use for that? I said Rollup. They asked What do u do in rollup? Tell me a little bit of urself and ur project? How many graphs hv developed so far? What are the components hv used in ur graphs? I hv a table which contains a numeric column. I ans: Reformat What will u do in reformat? Ans: When u want to send a singe record to the single output port we use output index.
I wnt to distribute my records in ratio? A part by percentage. Q Alternative? BcZ I dnt wnt partition the data. I wnt just filter?
0コメント