Starting rough, refining later on

Published: Aug 9, 2020 by Jenno Vink

Today I got the assignment to ‘enrich’ data (in Dutch: “dataverrijking”). This means, reading, combining and transforming data into something useful the people of datapreperation can design on a piece of paper.

The desired functionality included reading several large csv files, and based on parameters, reading the according csv (which can contain millions of records), and putting the data back into another database. I started rough and created an initial, naïve implementation to be refine later on. With this method, it would take 68 hours to complete the job. Way too long… That’s why I kept optimizing my code. My optimalizations include prefetching data from csv & bulk inserting into the database while monitoring the memory to make sure there were no memory leaks. In the end, I reduced the execution time to less than 7 minutes.

The final txt file (to be imported by sql server) was a little more than 1.2 GB which took 1:23 minutes to import it into the database.

What this says about my coding style

I like using an agile approch. Instead of needing to know every detail upfront, it is sometimes better to just get started see how things works out along the way. Later, optimizations can be implemented on when seeing the whole picture (including details).

BigData .NET Optimization

Latest Posts

Stepping up my dotnet game!
Stepping up my dotnet game!

Full course on design patterns

Convert your MIDI keyboard/pad into a resharper code navigator!
Starting a new chapter at Lijnco
Starting a new chapter at Lijnco

A little more than 2 years ago, I started working at vitas. During my time at vitas, I got the chance to work at 3 different companies and in the meantime get 5 certificates. The last company (Lijnco) now offered me a permanent position! I’m really exited and enjoy working for Lijnco. The thing that appeals to me the most is the diversity in the projects: every project is different and every project is a new oppertunity to do it a little better and a little more efficient. Besides that, the more projects we do, the bigger the chance that we can reuse code. That highly increases efficiency.