VC Healy's Blog

VC Healy's Blog

Confidence in my work - Pandas

Having had an evening, and today to work with my code of yesterday, I am confident that it is bringing the correct information. This has allowed me to remove the original webscraped dataframe from the final output to a spreadsheet and just present the facts obtained from the dataframe.

To build confidence I had to be sure that all of the rows were accounted for. As simple tuple generated by df.shape was enough, and pulling the row component from the tuple with [0] discards the column's value that I wasn't interested in. With a little tweaking to get this value into a series, I would then be able to add the value to the end of a dataframe.

Now it was time to get a series of the unique values from the dataframe and a count of how many times each value appeared in that series.

df_counts = df['ID'].value_counts()

A little bit of arithmetic to make sure the sum of the unique value appearances equaled the total number of rows, and I could discard the original dataframe from the output.

Great, I had just the interesting information, knew it was accurate, appended the total rows from the df.shape and the output to the spreadsheet, a sheet per process, was now simple to read and use.

Share this