This article is Part 1 of a series of interviews with INSAID Chief Data Science Mentor, Manvender Singh.
Data Science is nothing new now. It is everywhere and is one of the most sought after jobs. Do you know how can you succeed in it? It will be when you are clear on how to go about doing it and what not to do. To ensure this, you need to know the basics, stay clear of the misconceptions, know the tips and tricks to excel and other such nuances of the data science field.
Presenting to you a Q&A series with our Chief Data Science Mentor, Manvender Singh. He is an ISB, Hyderabad and NIT, Jamshedpur alumni and features in the Top Data Science Academicians in India. In this series, Manav, as he is fondly known, talks about the nuances of the field, clearing confusions, giving tips and sharing the secrets of the trade.
Ankita: There has been quite a buzz around data science, with Harvard Business Review’s article calling it the sexiest job of the 21st century. But many people are still not clear what data science is, so, if you could shed some light on the actual meaning of data science; what it is and what it entails.
Manav: Essentially, Data Science is a multidisciplinary field; a field of study that involves a couple of key skills.
One of them is programming, second is maths or statistics and the third is domain expertise.
So, essentially what a Data Scientist does on a day to day basis is that they apply all these three areas to generate valuable insights for the business, as well as automate a lot of tasks for businesses.
A Data Scientist does some bit of analysis, some bit of prediction, and essentially helping an organization do better using the power of data. So, this essentially is Data Science.
Ankita: You have cleared what Data Science is but still there are many misconceptions about Data Science and Data Scientists; probably revolving around the profiles of Data Scientists or what they do. Some of these common misconceptions are still prevalent in the industry. It would be great if you could clear these misconceptions.
Manav: There are two biggest misconceptions that people have when they’re trying to enter this field.
The first one is that people, even business leaders and CEOs, think that Data Scientists have answers to all the questions. They somehow think that just by hiring the Data Scientists, they will be able to solve all the problems of the organization and do magical things. The truth is farther from reality.
Data Science is a very scientific and step by step process and you need to have the right infrastructure, the right team and team settings and right organizational goals that you’re giving to your head of Data Science or Data Scientists to be able to derive some tangible value out of Data Science.
So, just to summarize, the first misconception is that the Data Scientists will produce exorbitant or magical returns out of somewhere. This does not happen just by hiring a data scientist.
The second misconception out there is from the point of view of people trying to enter into the space; most Data Scientists are severely under-prepared for the real industry-life scenarios.
What I mean is that most of the Data Scientists think that what they have learned during a training program, for example, Python or machine learning models, they somehow think that they will start building machine learning models, from the time they enter the industry. And, right from day one, they will be using all fancy machine learning algorithms.
When they actually go into the industry, they realize that 50%-60% of the work, and to some extent, sometimes even 70% of the work is about understanding the business problem, cleaning data, manipulating data, building those data pipelines, and getting the stakeholders together. Essentially, machine learning is just 20%-30% of the task, in the entire scheme of things.
So, this is another big misconception that people have; somehow machine learning catches eyeballs and that’s why people are fascinated by all this. But when they enter the industry, they realize that it’s a rude awakening. They actually get to see that machine learning is just a small part of the overall cycle.
Ankita: Now coming to the next question- what is the essential skill set for a data scientist? What does he need to know in terms of, technical skills or some other skill set that is useful for them?
Manav: A Data Scientist, as I said, is required to develop three multidisciplinary skills, which are very important, no matter what background a Data Science professional is coming from; so, it doesn’t matter whether they’re coming from a software background or a non-IT background. These three essential multidisciplinary skills are:
First, a Data Scientist needs to be good at programming. Programming skills are very, very important.
Second is maths.
Statistics is the language Data Scientists speak.
So, they need to have a very good command over maths.
Third is domain expertise. Essentially, at the end of the day, we are talking about Data Science, not because it’s the sexiest job of the 21st century, but because it is about deriving some real value for organizations.
These three skills need to come together. There is a need in the industry to have people who can marry these three skills well. Why? Simply because these skill sets prove to be really fruitful for the organizations. A student should aim at developing these skills, if he or she is trying to get into this space.
Ankita: There are many people who are freshers, having no experience in the Data Science field. What will be your suggestion to them to get into Data Science?
Manav: This is an excellent and very important question because a lot of people want to become Data Scientists and want to get into Data Science jobs, but have no experience. So, why should a company take them, if they don’t have experience?
The best and the assured way to get into Data Science roles is to build a great Data Science portfolio. Let’s think of it this way. If you’re a recruiter and I’m a potential applicant and you want to see how good I am, you will ask for my portfolio to know what kind of projects I have done.
The greater the quality of projects that I would have done, the greater the chance, first of all, of my resume to be shortlisted and secondly, of a detailed discussion with you when I am invited for the interview.
So, my recommendation is that if you want experience, work on really high-quality projects.
If you observe and pay attention to this part, what I mentioned is really high-quality projects. It cannot be simple Titanic projects or startup projects that people do when they’re getting into the space. In fact, what I recommend our students undergoing training in the GCD program also, is to avoid writing these projects on their resume. This is because it would surely make you look like an amateur Data Scientist who does not know a lot. Everybody writes IRIS data-set and Titanic data-set but it is a strict NO.
Portfolio means having the projects that have something unique about them. You must be having some insights from projects that you would have published in a blog. If you have done these projects, represent them in your portfolio through GitHub, which is the best way to showcase your portfolio and an assured way to gain experience and get selected for Data Science roles. Trust me, if you have a great portfolio, nobody is stopping you from getting a data science job.
We are not done yet. There are more queries that were aimed at Manav and he handled them, just for Data Science enthusiasts like you. To know what these are, do stay tuned and read the second part.