Here is an article by one of our Top 5 Budding Data Scientists, Amoolya Shetty. Read how she began her journey in the data science and AI space. Amoolya also ranked among INSAID’s Top Budding Data Scientist, click here to know more! !
Data Science is the buzzword today; a hot-skill in the market and, almost every other industry is in the lookout for a Data scientist. I always had an inclination for Machine Learning and AI ever since my engineering days and I used to read about it in bits and pieces.
Little did I know about the job profile of a Data Scientist or say Data Science. (Data Analyst, Business Analyst were often confused with Data Scientist). With so much of hype going around Data Science, I, on a random day, decided to look it up on the internet, spent about 30 minutes superficially browsing the Internet about Data Science and then I switch to my Instagram account. As I was scrolling down my Instagram feed, I chanced upon an advertisement on Data Science (Oh yes, the data mining or you can call it a recommendation engine played its magic in suggesting me the right ad!).
My instant response to this ad was a tap on the ‘APPLY NOW’ button. I read through the ad, and I learnt that there were few trial sessions one could attend before signing up for the actual course. Perfect. I got my email id registered for the trial sessions and was all set to attend the classes.
I quite enjoyed how the trial sessions progressed: I liked the real life examples and the analogies used by the instructor to explain the concepts of the Data science – they were lucid and apt. So after having attended the first three trial sessions, I decided to get myself enrolled for the actual course to further my understanding in Data Science. So yes, this is what led me to the doors of INSAID Data Science course (all thanks to the recommendation engine!:D )
My journey at INSAID, kick-started with the Python and Statistics basics. For, as a Data Scientist, knowing Python (or R programming or both) and having a sound understanding of Statistics is imperative.These two together serve as a foundation stone on top of which Data Science career is built.
As I come from Computer Science background, I was able to follow Python classes (Note, anyone, without a programming language background also could sail through python basics just fine. It is relatively easy when compared to other programming languages.Well, at least that’s what I feel!). Post the Python and Stats basics we transited to learning Exploratory Data Analysis (EDA) and Data visualization (DV).With the humongous amount of data at our disposal these days, it is quite natural for someone wanting to pursue a career in a Data Science field to easily get lost in this vast ocean of data. Same was the case with me – How do we start analyzing our data ? should we retain our data as is? what and how is the data clean up process?
These were some of the common questions that I had. Data Cleansing and EDA helps to answer these questions. Pandas is a python package that helps in data manipulation and data analysis. Data Analysis step is followed by what is called Data visualization. Data visualization helps in communicating useful insights effectively to the business stakeholders via graphical representation. Matplotlib and Seaborn python packages are used for data visualization (Seaborn is know to be an extension of Matplotlib; Pandas and Matplotlib are both built on top of Numpy package. Numpy is a python package that is used for performing scientific computation and it offers the most powerful data structure – n dimensional array).
Now, what came in as an overwhelming experience to me was our Term 1 and Term 2 final project. We were asked to pick a data-set, elicit useful insights from selected data-set and then communicate the same through data visualization (Something that every Data Scientist does on a daily basis).This project helped me assay my understanding so far on the basics of Data Analysis and Data visualization and gave me a sense of how it is to work on a real life data-set (a much needed exercise, I must say!)
The third term (current term), started with introduction to Machine Learning and walked us through the difference between Traditional Programming (TP) and Machine Learning (ML); which programming approach (i.e.TP or MP or both) do we need to opt in order to solve/address the business problem in hand. Also, we did touch upon the differences between AI, ML, Deep Learning and Data Science.
Often these days AI, ML and Deep Learning are being used interchangeably in most of the industries. But, something that we should know is that these words are not synonymous to each other, in fact, they are different. If we were to see it in the form of a hierarchy, AI is in the top (or is the root node); Machine learning is one of the branches of AI and Deep Learning is a sub branch of ML. Whereas, Data Science is the study that uses these tools( technologies ) to solve the real world business problem(s). The introduction class was followed by classes wherein we learnt about different categories of ML as per the learning is considered – Supervised, Unsupervised and Reinforcement learning.
Third term primarily focuses on learning different forms of supervised learning algorithms (regression and classification algorithms) . To name a few – Linear Regression, Logistic Regression, Decision Tree and Random Forest. And finally, it concludes with Model evaluation which is one of the Performance Measures (the other being Business evaluation) .
With everyone talking about data science, it is one such space that is rapidly blooming and there is a lot of advancement happening in this space every passing day. So, in order to catch up on the latest updates in Data science space it is recommended that we follow eminent data science leader(s) or subscribe to a good read (blog); that way we can keep ourselves abreast of the latest developments happening in the data science space. I have subscribed to ‘Towards Data Science’ blogs – one can find interesting and insightful reads around data science in here.
I’ve also subscribed to Andrew Ng and Cassie Kozyrkov‘s quick read in Medium. On YouTube, I follow Python programmer (Giles McMullen’s channel) and Joma Tech channel (I find these two channels very insightful and entertaining at the same time ). Another area to focus on is building a good GitHub profile and a LinkedIn profile. Having a strong GitHub and LinkedIn profiles, can help us land a Data science job in our dream company. And these are the very points, the instructors at INSAID also stress on time and again.
My journey in Data science has just begun; I am taking toddler steps and I have a long way to go. I believe that Data Science is an art and like any other art it is to be pursued with interest and dedication. And, practice is the key to perfection – debiting some time on a daily basis, and practicing a bit of coding and dedicating some time to read about the trends in Data Science can help us sail through the ocean of data smoothly and this is my key takeaway from INSAID course.