Global Certificate in Data Science (GCD) is one of INSAID’s flagship courses. GCD aims at creating Data Leaders from aspiring data scientists. The course spans over 6 months, is divided into 6 terms and is conducted by some of the Top Data Science Mentors in India.
Today, we’re in conversation with one such brilliant Data Science mentors, Suchit Majumdar, the Chief Data Scientist and Architect of the GCD Curriculum! An ISB Hyderabad Alumni, Suchit is one of the top 20 Data Science Academicians in India!
This is the second part to the two-part series Exploring GCD With INSAID’s Chief Data Scientist! You can read the first part here!
Malvika: How different is the student experience when it comes to programmers and non-programmers?
Suchit: If you are a programmer, then a major chunk of the burden is already removed for you as in you don’t have to learn to program separately.
Programming is valuable because today the growth in data is really fast. Volumes of data are being captured and analyzed on a day to day basis and so a tool like Microsoft Excel will fall behind.
So somewhere down the line non-programmers need to start learning how to code and a language like Python was developed for that very specific purpose. It’s so easy and so quick to pick up. An actual non-programmer can get up and running with Python within a month if practices an hour and a half every day. The way I see it, a month or 4 weeks is quite a good time to get started with Python.
Having said that, the non-programmers initially find it a little hard to transition from non-programming to programming, the reason is a mental block that they have created for themselves. Once they understand that Python is not very different from the way we work on an Excel Sheet; once they get that idea, then it becomes all the easier for them to cover up. You need to understand that being a programmer or non-programmer doesn’t really matter because, at the end of the day, we all want to become data scientists.
Malvika: What advice would you give to an apprehensive data science newbie with no coding background? How does our GCD course ensure a smooth ride for them?
Suchit: To ensure that everybody starts off at the same level or at the same platform, we provide starter kits to the audience so that they can go through the different ideas of Python and statistics. A lot of people have forgotten because they studied that 20 years back.
So just to help them recap on the statistics part and get them up and running on coding, we provide them with Starter kits which are sufficient enough to help one get into data science theory at least from the Python and statistics perspective. So if you’ve covered the statistics and programming part of the theory, now you’re all set to learn the implementation of data science.
Malvika: What are some other resources that the students should access to better themselves as a Data Scientist?
Suchit: So definitely, we highly encourage reading blogs. Read blogs like Medium, Data Science Central, KDNuggets, these blogs we always recommend.
I know a lot of people are not into reading. For people who find reading a little difficult, I always recommend watching videos on YouTube. YouTube is the biggest repository of video collections. So when you have so many videos online, just go and check out the videos in your particular field or videos of data science solving XYZ problems.
So it depends on person to person, for people who like reading, I always recommend blogs, people who don’t should definitely watch videos.
Malvika: Which algorithms and packages are easy for the students and which ones are tough?
Suchit: This happens very often when people start with the mindset that machine learning is very difficult; there are so many things that I need to learn under the sun.
I would always advise them that never ever assume that anything is difficult until you don’t try it out. Technically, if I were to tell you that everything is in place, applying a machine learning algorithm is 4-5 lines of code. So if you assume that something is going to be difficult, it is always going to be difficult.
Once you actually try it out, you’ll understand its not actually that difficult as you set in your mind. It is all about breaking the ice and coming into this field, otherwise, data science is not as difficult as it looks from the outside.
There are some very popular packages that make this particular field stand out, thanks especially to the Python community. The first package that I’ll name is definitely Pandas; it’s one of the best packages that people can work with. You can learn almost every aspect of it and you can improve it. Whatever you are doing on your Excel sheets, you can do it better using Pandas.
We have multiple beautiful plotting libraries, something that students can learn a lot from; you can plot complex graphs and charts easily in Python. Here, we have a package called Seaborn.
There is one master package in machine learning, I think that all of us have used or seen at some time called Scikit Learn. So if you understand these three packages, you are doing pretty well for yourself in the field of data science.
Malvika: What are the common mistakes that students commit when they are pursuing data science?
Suchit: The first point I’ll raise here is that often people do not give weight-age to practice. People feel that watching a particular session over the weekend will suffice. Then they can fall into a bigger trap. What I would always recommend is watching a session is good, but then practicing it out and doing it for yourself is even better.
Start with a mindset where you tell yourself that this is a journey where I will put in 60-70% of efforts on a regular basis instead of doing 150% on two days and then burn out.
You can start with smaller efforts every day. If you start at 50% you can increase that over time as your standard grows and start implementing up to 60-70% progressively. Having that mindset of practicing continuously is something that will help you get over the border as soon as possible.
So data science can be covered in no time provided you put in enough practice. You have to put in enough practice over the learning period.
Now the second mistake that a lot of people do is they assume that only taking from the materials available in class is sufficient. We always recommend people that never stick to the class materials only, go beyond the materials shared in class and always try to venture out and see what else is available online.
Now you get tons of free resources online. We also provide many online research resources on our learning portal. You’re most welcome to dig out things beyond the regular classes as well.
Suppose you’ve learnt Linear Regression and you’ve read the pre-reads and post-reads as well, now you would want to take it one step further and probably go back on a Monday or a Tuesday and in your own office try to figure out what is the problem that you’re solving for the customer. You could try to find out certain problems that can be solved with linear regression.
So taking that one additional step towards solving your day-to-day problems, I think will accelerate your learning much faster rather than just relying on the class materials.
So lack of practice and not applying themselves are two of the mistakes that a lot of people commit in this course.
We hope we have now answered most of your queries about the GCD program with our Data Science maverick, Suchit Majumdar. If you have any questions regarding the program, feel free to write to us at firstname.lastname@example.org!