My Journey as a Data Science Intern: From Newbie To Data Ninja

Introduction

Hey there, fellow explorers! As my incredible journey at FutureSmart AI wraps up, I can't help but marvel at the adventure I've been on. From a data science rookie to a confident intern, I've soaked up so much knowledge and want to spill the beans on all the cool stuff I've learned.

Picture this: I started with a solid grasp of machine learning and math, but real-world model fine-tuning and deployment were like uncharted territories for me. But hey, I was eager to jump in and tackle these challenges head-on.

Learning Experience

One cool part of my time at FutureSmart AI was when we worked on a project where we looked at Glassdoor reviews using the OpenAI API to figure out people's feelings about different things. It was like diving into a new world of AI that I hadn't explored before.

As I moved on to other projects, I took on more responsibilities willingly. In one project, I made a chatbot for a client, which helped me get even better at getting information from the internet and making things work automatically. I also got better at using the ChatGPT API, which is like teaching computers how to talk like humans.

What made my time even more interesting was learning about cloud technologies. We made things that could be used over the internet, like fancy services or apps, using Amazon Web Services (AWS). This gave me a chance to learn how to create things that can handle a lot of people using them at once, like when a lot of people want to watch the same video online.

I also got to try something different by using a low-code platform. This tool helped us build complete solutions for our clients without having to write lots of complicated code. It was like solving puzzles in a new way and made me better at fixing problems creatively.

Task Highlights

My first project was to scrape data from Glassdoor, particularly employee reviews, and perform aspect-based sentiment analysis on it. During this project, I got exposed to OpenAi API. This project also required web automation expertise and so I used Selenium to scrape data, and store all the data in a pandas data frame and used a few shot learning to train our openai model. The goal of this project was to analyze the employee reviews of a particular company to determine various factors about that company like work culture, perks, etc.

My next project involved extracting specific information from a list of URLs representing different categories on a website. This required using web automation tools like Selenium and implementing web scraping techniques using Python. The project presented challenges in handling dynamic web elements and managing large volumes of data. However, through careful analysis and coding, I successfully automated the scraping process and generated a structured CSV file with the extracted data.

Next, I worked on creating end-to-end chatbots with a custom knowledge base. This involved extracting data from different web articles and URLs using web scraping, storing the data in a vector data frame, chunking and embedding the data in a vector database like Pinecone, and finally using open API to answer user queries based on the custom knowledge. I used streamlit for the frontend part and deployed the app on aws ec2.

Furthermore, I was engaged in a project that encompassed extracting data from an API, storing it within a MySQL database, and executing data manipulation tasks using Python. This undertaking demanded adeptness in managing API inquiries, establishing connections with the database, and employing Pandas for data manipulation. I faced complexities related to managing API rate restrictions and guaranteeing the fidelity of data throughout the extraction phase. Nevertheless, through the implementation of streamlined algorithms and the application of sound error-handling approaches, I proficiently accomplished the tasks of acquiring, storing, and processing the necessary data.

I also worked on low-code platforms like Anvil. I utilized the Anvil low-code platform to develop a chatbot efficiently. Leveraging Anvil's intuitive interface and pre-built components, I created a functional chatbot that seamlessly interacts with users. The platform enabled me to design the chatbot's user interface, integrate it with backend logic, and implement dynamic responses without delving into complex coding.

Resources and Practices

Python and Data Manipulation:

Python served as the cornerstone of my work, enabling comprehensive data manipulation, analysis, and automation tasks. The expansive collection of Python libraries, including key players like Pandas, NumPy, and sci-kit-learn, significantly facilitated the efficient processing and management of large datasets.

Conversational AI Development:

My proficiency with advanced language models, specifically ChatGPT and Langchian, empowered me to construct sophisticated conversational AI systems capable of understanding and generating human-like text. By fine-tuning these models, I elevated their performance within the domain of customer support interactions.

Chatgpt On Your Data:

Implemented openai API to generate embeddings, stored those embeddings in a vector database like Pinecone, and performed semantic search operations on these embeddings. In this way, created chatbot solutions for clients on the client-specific knowledge base.

OpenAI API Integration:

Leveraging the capabilities of the OpenAI API, I harnessed its potential for sentiment analysis on movie reviews, extracting valuable insights, and generating contextually relevant responses to customer queries. The seamless integration of this API amplified the power of robust language models within my projects.

Foundational Data Science Libraries:

Central to my work were the indispensable contributions of foundational data science libraries such as Pandas, NumPy, and sci-kit-learn. These tools enabled fluid data manipulation, numerical operations, and the application of diverse machine learning algorithms, spanning tasks like classification, regression, and clustering.

Collaborative Tools for Efficiency:

Throughout my internship, I capitalized on an array of collaborative tools to ensure efficient teamwork and streamlined project management. Git provided reliable version control, Jupyter Notebook and Google Colab facilitated interactive data analysis and rapid prototyping, while diverse collaboration platforms fostered seamless communication among team members.

AWS Cloud Platform Utilization:

Expanding beyond the core tools, I delved into AWS Lambda and AWS Serverless Application Model (SAM), integrating these services to simplify deployment and management on the AWS cloud platform. This approach enhanced scalability and cost-efficiency for my projects.

AWS Service Integration:

Within the AWS ecosystem, I effectively harnessed a range of services such as EC2 instances, S3 for storage, and RDS for database management. These resources provided the necessary infrastructure and support to enable smooth deployment and optimal functionality.

Streamlit for Interactive Apps:

Leveraging Streamlit, I successfully deployed multiple interactive web applications on an AWS EC2 instance. These applications offered intuitive interfaces that provided users with easy access to sentiment analysis functionality.

API Testing with Postman:

Lastly, I employed Postman as a vital tool for testing and debugging API endpoints. This tool allowed me to meticulously assess HTTP requests, validate responses, and ensure the seamless performance of developed APIs.

Grand Finale

As I wrap up my adventure, I'm so thankful for all the guidance and mentorship. FutureSmart AI has been a launchpad for growth and learning. With tools like AWS and Python in my belt, I'm all set to tackle real-world data challenges and make big solutions.

Looking back, I'm filled with gratitude for the hands-on experiences and amazing mentors. Everything I've learned will be my superpower in the world of data science. I can't wait to keep growing, taking on cool projects, and using data to change the world.

So there you have it, my journey from newbie to data ninja. Until next time, keep exploring and never stop learning!

FutureSmart AI Blog