DS in the Real World

Data Scientist is a Generalist

Disclaimer: all opinions in this article are my own and not associated/influenced by any organization.

I recently attended a tech conference, which had 4 tracks: data science, software, product, and career. Among all four, data science was the most popular — not surprised. Data Science track rooms had long lines and overflow rooms were full as well.

Data science has been one of the hottest aspiring careers in tech. Since I changed my title to “Data Scientist” on LinkedIn, my popularity on LinkedIn has doubled. There are many aspiring data scientists who want to learn more and network with current data scientists in the field. But despite its popularity, it is not very clear to many what data science is or responsibilities of a data scientist. Given the interdisciplinary nature of data science, it makes it harder to fully grasp its breadth and depth. Google search gives a wordy and complicated explanation that even confuses me sometimes:

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

The above definition is trying to say is that Data Scientists are “generalists.” Let’s take a pause. The term generalist (according to Google) means a person competent in several different fields or activities. Well then, let’s put two and two together. A Data Scientist is a generalist. They have a vast knowledge in more than one discipline, including Data Engineering, Statistics, Machine Learning, Product Development, and Business understanding.

Are you still confused? Alright, let’s try one more time: this Venn diagram explains data science in a simplified fashion. The intersection among all disciplines are the skills of a data scientist. They have a vast knowledge in various discipline. They have breadth — not depth.

Hope by now you have a decent understanding of the skills of a data scientist. But you still have questions? Keep on reading.

Ask me anything

I did a Data Science AMA (ask me anything) a few months back. The number of questions and responses I received was overwhelming — again I was not surprised, it is one the hottest tech jobs out there. I have a compiled a list of questions from my AMA session and answering them all to the best of my knowledge and experience. I hope reading through these answers gives some clarity to those looking for answer, especially those who are trying to decide on a career path.

1. What does a data scientist do?

A Data Scientist wears many hats. They have a vast knowledge of data engineering, statistics, machine learning, computer science, business, and product discipline; their knowledge is built on breadth. They don’t go deep in one specific field but rather apply the breadth of their knowledge to solve a business/product problem with scientific solutions.

An example of a data science project: a business/product teams come to you with a business problem. You have a good business acumen to understand the problem. You know what data to use and how to source that data independently. You understand the data in depth and are able to select features and propose methods appropriately. You can identify one or more scientific, statistics/machine learning methods that could solve the problem. You can do exploratory analysis, prototype the models independently keeping productionization of those methods in mind. Some data scientists productionize their models themselves (though not required) and some build prototypes and partner with engineering teams put their models in production. You prototype the model evaluate it and present your finding and recommendation to stakeholders.

2. What was your journey to becoming a data scientist?

My introduction to tech and data science was accidental. Growing up, I didn’t know of anyone in the family or friends circle who studied or was in tech. I was aware of two options: business and arts. I chose business and end to business school but three months away from graduation, I decided to take a database class and fell in love with data. I found my passion three months away from graduation but it was too late to change major so close to graduation.

I found my passion three months away from graduation but it was too late to change major so close to graduation.

But I was determined. I decided to take additional courses in data outside of my degree program. That was one of the best decisions. I landed a full-time job at Amazon as a data engineer (a technical role). I learned a lot as a data engineer, built many pipelines, applied my knowledge of databases, and mastered SQL. After two years, I realized that although I was working a lot with data, I don’t get to “play” with data. I did one project in data analytics and realized that’s the direction I wanted to go.

I switched to a different team to work on data analytics projects. And that’s where I got introduced to statistics, machine learning, A/B testing. The more I learned about each of these spaces, the more I liked it. In my free time, I read books on these subjects, took online courses, went through my teammates’ science work and tried to learn end to end how and what they are doing at each step. Soon, I was given data science projects of my own to apply what I have learned. In the beginning, I needed a lot of guidance from teammates and mentors. After two years of hard work and dedication, I delivered several projects in data science space and was officially nominated to be a Data Scientist. Here I am now.

3. Can you recommend some books and course for beginners?

Some books that I have personally read to help my transition.

4. Can you recommend some online courses and websites for learning data science and machine learning?

Two of my favorite online learning websites are Coursera and EDX. They have a vast selection of data science and machine learning courses from various universities.

If online learning is not your cup of tea, then look into in-person data science bootcamps and certificate programs at a university. I personally haven’t taken these but if I were to pick one, I would focus on program’s job outlook for students that have graduated those programs. That’s a good indicator to find the relevancy of the material taught in the program. Do your research into the program and ask them for data and concrete examples on job outlook after graduation.

5. What do I need to do to become a data scientist?

Two options: 1) accredited degree program, 2) online learning, bootcamps, and certificates.

In recent years, many universities have introduced Data Science degree programs. One of my friends went to UC Berkely for Data Science Master’s program and successfully started a Data Scientist as a well-known tech company after graduation.

I also know of people, including myself, who took the untraditional route and self-taught themselves data science. Thanks to modern technology and online courses, becoming a self-taught data scientist is possible through online courses, websites, books, a good mentor and a lot of discipline.

6. How to stay motivated while studying data science?

  • Stay focused on your goals.
  • Write down your long-term goal and then make a list of short-term goals that will take you toward your long-term goal.
  • Surround yourself with the right crowd.
  • Learn from other scientists. Don’t be shy asking questions.
  • Don’t get discouraged if you fail; Failure is part of the process. If you have never failed, then you are playing it too safe and missing out on learning opportunities.
  • Don’t compare yourself to others; Everyone has a different journey. Focus on your journey and your path. Compare yourself to you.
  • If you ever feel down or discouraged, remind yourself of where you were 10 years ago and where you are now.

7. I am trying to get a job as a data scientist. What do I need to focus on?

If you are new to data science, then study it and build project portfolio. Be good at coding, math, statistics and ML.

If you are a data scientist, then build project portfolio and practice your interviewing skills.

8. How long would it take to learn Machine Learning and Math from scratch?

If you studying it full-time, I would say 3–6 months. If you are doing it part-time, then anywhere from 6–12 months depending on how much time you are dedicating it. These estimates vary by person of course.

In either case, learning is important but practicing and applying is even more important. If you learn it but never apply, then chances are you will forget it.

9. How did you start in Data Science? Do you have any programming background?

See answer to #2.

I don’t but I’ve self-taught R, Python, shell scripting, and SQL.

10. Do you like your job?

Yes. I like my job. Data Science work, although pretty vast, can be pretty interesting. Plus, if you end up in a good team and working on projects that align with data science scope, then it’s even more fun.

If I have to pick a downside, I would say it’s hard to build depth in data science given its multi-disciplined nature. It’s also frustrating at times when others don’t understand your job family and you have to continuously explain it.

11. How to prepare for a job at fortune 500 and not get rejected?

Not get rejected? That’s hard. Chances are that you will get rejected. There are so many variables involved when it comes to landing a job. The variable that you have control over is your resume, interview preparation and determination to keep going despite the hiccups you may face.

Think of interviewing in two parts: 1) behavioral interviews, 2) technical interviews. You have to be good at both to land a job and the more you practice the better you get. There are tons of good websites that share resources on interview prep for data scientists. For technical interview prep, my favorite one is HackerRank and LeetCode.

12. I am studying Computer Science. I also want to become a Data Scientist. Could you guide me?

Keep studying computer science but with a focus on statistics and machine learning. This background will make you a strong candidate for data science and applied science positions.

13. Do Data Scientists get to work on the product development stage?

Yes and no. The answer depends on the company and the scope of the role. In many companies, data scientists drive product decisions by driving learnings from the data. So yes but varies by company.

14. Who gets paid higher? Data Engineer or Data Scientist? What about Machine Learning Scientist?

Ranked by order:

  1. Machine Learning Scientist
  2. Applied Scientist
  3. Data Scientist
  4. Data Engineer

Data Scientists and Data Engineers tie at most companies.

15. When can you call someone a data scientist?

When they can independently do the following: define business problem, source data, identify scientific solution, prototype and/or productionize scientific model, communicate to tech and non-tech stakeholders.

16. What programming languages a data scientist needs to know?

Python, R, SQL, Shell.

17. Is data scientist really one of the sexiest jobs in tech?

Maybe. If you enjoy building stories with data and get a great amount of satisfaction from driving new insights.

19. Can you share useful data science resources?

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium here.
  2. Follow me on Instagram here. I post lots of updates and interesting stuff here!
  3. Also, be one of the FIRST to subscribe to my new YouTube channel here!
  4. Follow me on LinkedIn here.
  5. Check out my website, sundaskhalid.com.

I write about data science, diversity & lifestyle | currently at Google | more learning content at sundaskhalid.com