Google Colaboratory – Reading and writing files

If you don’t know about it already, Google Colaboratory is a great place to use some high end resources for heavy machine learning models. Its a cloud based jupyter notebook with some bloody good hardware to help you out. At the time of writing this post, I have a Tesla T4 at my disposal!

I am training some neural networks to detect musical patterns. One problem I encountered is the size of training data. It’s not practical at all to generate the data on the fly and then train the network. The only way is to generate the trainign data offline, save it to a file – (in my case its a .csv), and use the data to train the network.

When I started out, I had some difficulties in figuring out how to save and load files in Google Colab. I hope this guide helps you to do just that 🙂

I assume that you already know how to log in and use Google Colab notebooks by this point – I wont be going in to the nitty gritty’s of it. Here’s the LINK – have fun!

So to start off, lets create a notebook and generate some dummy data: Here, I’ve created a 10×10 matrix with random integers in the range of 0-100.

Now lets write this data to a csv file saved on your drive. In order to do this, you must first set your drive as a mounted device. Enter the python snippet below and you will be required to enter an authorization code.

All you need to do is;

  1. Click the link given – it will open a sign in page on a new tab
  2. Sign in with you google account
  3. Copy the security code given to you
  4. Go back to the notebook tab
  5. Paste code
  6. Hit Enter

Now that your drive is mounted, lets save the random dataset we created onto a csv. Note the save path!. The path I used is ./drive/My Drive/Colab Notebooks/data.csv and the file will be saved at Colab Notebooks/data.csv inside my Google Drive.

Now that we’ve written to data to a csv file. Lets read that file and verify the data;

Note that in the orignal dataset, there were no decinal markers as the datatype was int. But when we read the data from a saved csv file with a QUOTE_NONNUMERIC setting, the data is read as float values. This is not as issue. If you really need to stick to particular data type – just cast it.

I hope this article was helpful. Please give a shoutout if you found this useful!

Cheers 🙂