Sign in

Recently I had a project to load the data from excel sheet into database. The excel sheets are monthly summery report and they have many merged cells, so I wrote a utility function in python using openpyxl library to unmerge the cell and populate the top cell value.

Here is the code, the first helper function create_merged_cell_lookup is taking the one worksheet and return a dict of cell location and its cell value. Then this helper function will be passed into unmerge_cell_copy_top_value and we use the unmerge_cells from openpyxl then populate the value from create_merged_cell_lookup .


In 2017, after graduating from GWU with degree in statistics, I got a job working in a digital company as a Jr. Data Scientist. After working for about 3 years, I learnt quite a bit through many projects and tons of mistakes which I wish I have known about these…


Background

Apache Spark is one of the most widely used frameworks for distributing data in parallel across a cluster, and it often works together with Hadoop in Big Data analytics. A standard Spark cluster architecture as below:

The application will be submitted to a Spark driver, from there it connects to…

Daisy Lin

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store