A persevering professional with a keen interest to learn data science, Sohomjit Ganguly wanted to learn about data science and machine learning so that he could apply the same at work. And, when he found out about INSAID’s data science certification courses, there was no turning back. He enrolled for the Global Certificate in Data Science & AI (GCDAI) program at INSAID, and with sheer diligence successfully completed the course and is now the Data Scientist in his current organization, TCS.
In this interview, he elaborately shares his learning journey at INSAID and what significance data science holds in his life.
Question 1: Which program & batch are you part of at INSAID & tell us more about your current work profile?
Sohomjit: I am part of the 2019 April cohort at INSAID and enrolled in GCD program and then upgraded to GCDAI. Currently, I am working in a data scientist role in my present company, Tata Consultancy Services. I am part of a core retail consumer product group, which is a product engineering specialist division. My responsibilities are data preparation, architectural design, predictive analysis, cloud integrations, real-time guys 40 using Power BI or Tableau, sales forecasting of the retail client using regression simulation, such as Monte Carlo scenario analysis and clustering, etc.
Question 2: Walk us through your career journey & what got you interested in Data Science & Machine Learning?
Sohomjit: My career has always been fascinating and very enthralling to me. I typically started as a Java j to E resource, working on the front end and back end of the Java infrastructure for large detailed plans across variable domains. And later, I got interested in cybersecurity, specifically, the identity and access management branch. And quickly, I rose to the rank of subject matter expert and consultant.
I was deputed for two years in on-site span for Motorola Solutions in Chicago. When I was there, I saw that they had a vast mail inflow from clients and retailers alike, and the segregation of that mail was done automatically.
I was very excited to know how this was being managed on such a large scale because it was out of my domain, and it was fascinating to look at. So, I asked the peers, managers and the technical leads of that domain how they managed to achieve this.
They then talked about data science, spoke about machine learning, and how they can segregate and sort out the mails according to their subject, sender, and recipient. And that got me thinking that this is a beautiful way of interpreting data and making it useful as it hardly had any human intervention or interruption. So that was when I got interested in data science and machine learning.
After that, when I came back to India, I started researching around the learning opportunities and data science certification courses available so that I can learn it and achieve the things that I thought were impossible.
And, in the meantime, I did a full time course in MBA in Calcutta University where I specialized in business analytics. Although the lessons on data under that course were limited, I knew it’s relatively high implications.
Question 3: What tools and packages in Data Science & Machine Learning have you mastered in your Data Science & AI program at INSAID so far?
Sohomjit: I have come across many packages in my data science and machine learning journey and my AI journey so far. So mastering a package or a toolset is a huge thing. Still, the tools and practices that I have at least an experience in are NumPy, Pandas, Matplotlib, and Beautiful Soup in web scraping; the JSON package was a blessing for me while decrypting JSON files into data frames. The Pandas profiling package was convenient to see the variation in data upfront. Presently, I have fallen in love with PI carrot, and it has been very fruitful in my recent ventures.
Question 4: What are some initial challenges when you got started on your Data Science journey, and how did you overcome them?
Sohomjit: The initial challenges that I faced when I started the data science certification course at INSAID were understanding how to get insights from a data set. Coming from an identity and access management domain, I had zero ideas about what our data could provide to us; I had no understanding of statistics; I only knew statistics as a paper to pass an exam. But this was a challenge that I overcame in my masters. I understood that it is a profoundly ingrained theory where I could get the information entropy of a data and then act upon it to make clear and concise business decisions.
Moreover, the data processing and preprocessing was also a challenge as to how to impute missing data, how to fit it; it’s still a challenge to me because it is also based on another thing, an essential aspect, which many people I think have flare but I had to go through a lot to understand that is the domain I was from I have served a lot of domains for different clients.
But presently, I’m working for the consumer product goods domain CPG. So I had no idea about the term; although I was part of retail, CPG has a special section. So I had to understand that now, all the ins and outs, getting into the nitty-gritty of this, and that took time. So I think these were significant challenges for me, which I had to put effort into to understand them and then get around it to get the effective result.
Question 5: Who is your favorite faculty at INSAID, and what did you learn from him the Most?
Sohomjit: Again, a tricky question! I have more than one as I have met multiple outstanding faculties during this data science certification course. Firstly, I would like to thank Suchit and Manavender because I started my data science journey with them. Suchit is an inspiration to me. I think he’s an excellent teacher, a wonderful person. He inspires a lot.
And if I feel that I can become at least 20% like him, I would find myself successful then. So thank you so much for guiding me. He is a person whose teachings instill courage within me. I feel empowered when I listen to him. I would also like to thank Manavendra for this journey.
Apart from them, I think, hats off to Deepesh. He is one of the best teachers I’ve come across. He is patient. He has an excellent teaching methodology. He even brings the complex concepts down to our level, and he teaches them with most patience and care. So, thank you for the pace.
The constant drilling has pushed some concepts deep within my brain, which I will never forget in my life. The PI is a very, very difficult thing to grasp in one go. But he is an excellent teacher. And he has taught the subject very eloquently. He has listened to all our questions and has answered them correctly.
His patience is incredible. He has taken care to teach us all the challenging concepts slowly so that we can at least relate to them and can have them at the back of our minds always. I’m grateful that I had these teachers in my journey and would love to learn more.
Question 6: What is the goal of Data Science?
Sohomjit: Right now businesses are becoming more data oriented. So, I think the goal of data science is to make life easier by enabling us to make informed decisions. But, another way of defining the goal of data science is the means to extract business-oriented insights from data and create that information channel flow so that we can make those decisions again and again, upon updated data and knowledge. And, to do that we need to understand the value of data first and then centralize our entire infrastructure around it so that any decision made is accurate and data driven.
As vast amounts of data are now available to us, why not use that data for every panel in an organization so that everything is accurately tracked, interpreted and understood. Of course we would also have to identify whether the collected or generated data adds value or not. But if everything suffices, I think we will be greeted with a better future, a well-informed lot, and a well-defined end.
Question 7: In your view, how has Data Science evolved in the last few years?
Sohomjit: I think data science has evolved a lot in the last few years. For example, in my company, I have seen processes where I had to fill in forms with all the details. The process was slow. But presently, that same form only requests consent to send it as a yes or no. So, that means all the data that I have on the system is very efficient and correctly picked up by the system, and it is just asking for my approval reducing a massive amount of labor on my side. The errors are also minimized. So, if I understand from this perspective, data science is no short of a miracle because in the past nobody could imagine such possibilities of automation.
Other revolutionary examples of data science can be wearable technology, like mobile phones are already equipped with fitness trackers. And right now, they even come with smartwatches and wireless earphones containing touch-enabled sensors and sensory data.
So it is quite possible that in the next 10 years, data science and machine learning would even enable people to measure their blood pressure without the help of a doctor or a sphygmomanometer. But that has to come at an affordable cost as technologies like data science, machine learning and artificial intelligence can only reach the masses when the prices of the products having those technologies are economic.
So I guess, data science has evolved a lot, and it is still evolving. Big Data is a huge thing. Now, I have worked on Pyspark. I know how wonderfully ETL pipelines can be created. These big data can be transmuted, transformed, and operated on in seconds to create beautiful data frames and insights. And, it’s surely a fascinating time for data science enthusiasts.
Question 8: What are the current trends in Data Science that you are most excited about?
Sohomjit: The most current trend in data science that I’m quite fascinated about and I’m slowly learning as to how it is done is the natural language processing for a conversation on analytic citing NLP in conversational systems. It’s already in place by Google and Amazon, but they’re still perfecting it. So the thorough-end conversations can be made in any natural language.
And I think the fact that you are talking to a robot in your own language, and it, simultaneously, learning and acting upon whatever information you are feeding it is just mind-blowing.
The other thing that I am pretty much excited about is in-memory computing. In-memory computing is a thing where they are opposed to the traditional methods of hard drives and relational databases. It initiates instant decision making and reporting in real-time and enables the computers to make applications richer, with more interactive dashboards. And, as memory is becoming cheaper and businesses rely on real-time information, it’s a boon to have. I think that’s a fascinating trend to follow and learn about.
Question 9: Which are some of the blogs that you follow?
Sohomjit: I follow KDnuggets a lot. I love to read the blogs on Medium that have small code snippets. That really helps me grasp the context. I love the Dataflow blog as well. I also follow the Data Science Central from time to time. And my favorite kind of platform to get all the aggregated news for data science is the Reddit Data Science Channel.
Question 10: What is your advice to anyone wanting to start a career in Data Science?
Sohomjit: My advice to anyone starting a career in data science is to brush up your knowledge in statistics and learn a bit of programming. Preferably, if you’re coming from an IT background, Python will be a friend of yours. There are other languages, computer languages, which have a massive potential in data science, for example, the programming language of Julia.
But again, I think you must concentrate on the methodologies first because the medium in which you will expect those methodologies is not of concern right now. You can code in Java and accomplish something that you had to code in Python; you can code in C and C+ or C++ and get that same result faster, even with more code lines. So it depends on the instrument you are good at. But personally, I prefer Python as it lessens your workload.
The second thing I would like to advise is familiarize yourself with statistics. It will help you understand the natural progression of data and how it would provide you insights, and what tools and techniques you should implement to get those insights.
And lastly, I would advise you to have patience and don’t be afraid to fail. Failure gives you the opportunity to try again and rectify yourself. Just put in the effort to slowly gain your expertise. Find a suitable data science certification course so that you get in-depth and formal training in it.
This was a conversation with one of our GCDAI students – Sohomjit Ganguly.
If you want to read more such interesting and insightful student stories, check out INSAID Spotlight.