I thought I would bring some more data to the discussion.
I ran a series of tests on this issue.
By using the python
resource package I got the memory usage of my process.
And by writing the csv into a
StringIO buffer, I could easily measure the size of it in bytes.
I ran two experiments, each one creating 20 dataframes of increasing sizes between 10,000 lines and 1,000,000 lines. Both having 10 columns.
In the first experiment I used only floats in my dataset.
This is how the memory increased in comparison to the csv file as a function of the number of lines. (Size in Megabytes)
The second experiment I had the same approach, but the data in the dataset consisted of only short strings.
It seems that the relation of the size of the csv and the size of the dataframe can vary quite a lot, but the size in memory will always be bigger by a factor of 2-3 (for the frame sizes in this experiment)
I would love to complete this answer with more experiments, please comment if you want me to try something special.