Performance of Column propagation in Datastage

I wanted to find out if column propagation in Datastage has a performance penalty and i was surprised by the result.

I created a Job with two streams. Both streams have a row generator, two transformers and a sequential file that write to /dev/null. I use a row generator and /dev/null to limit the influence of IO to a minimum.

The first transformer is the same in both streams. The transformer is used to create 10 fields with a length of 15 chars. The difference is in the second transformer. In the top stream i copy all fields to the output. In the bottom stream i use column propagation to copy the fields.

The job generates 100,000,000 rows. The stream with column propagation processes the stream with 1,000,000+ rows per second and the stream without column propagation processes the stream with 100,000+ rows per second. The stream with column propagation is 10 times faster.

So if you're sure that you want all columns to be outputted and only want to transform some of them, use column propagation to copy the other columns.