Data Science Diaries: Navigating My Internship Journey

Photo by Boitumelo on Unsplash

Data Science Diaries: Navigating My Internship Journey

Introduction

As I embarked on my journey, the prospect of applying theoretical knowledge to real-world scenarios fueled my passion for data science. In this blog, I aim to chronicle the transformative odyssey of my data science internship at FutureSmart AI — a journey marked by significant learning curves, tangible accomplishments, and the development of skills that transcend the boundaries of traditional education. The experience wasn't merely about acquiring technical proficiency; it was a holistic exploration of teamwork, problem-solving, and the intricate dance between data and decision-making.

As I delved into the intricate world of data, the prospect of translating theoretical knowledge into actionable insights and solutions became a driving force in my academic and professional journey. The pursuit of this data science internship wasn't just a checkbox on my career to-do list; it was an opportunity to immerse myself in the real-world applications of data Science, machine learning, and Natural Language Processing.

Background

I started my Bachelor's degree in Computer Science from Indian Institute of Information Technology (IIIT) Vadodara - ICD. Being from science background with an inclination towards Statistic and Data, I started getting into Machine Learning right from second semester of my B.Tech. It was a great learning curve and best foundational course from one of the best ML professor Andrew Ng. Gradually I started exploring Data Science concepts theoretically. With getting surrounded by theoretical concepts, it was of utmost importance for me to get my hands on real life application of Data Science.

This is where my internship at FutureSmart AI gave me a breakthrough to submerge myself into this practical experience through real world projects and gave me confidence to go from Rookie to being a confident Data Science personnel.

Learning and Growth

Embarking on my data science internship, I was met with a myriad of opportunities for learning and growth, each project and task serving as a stepping stone in my professional development.

Overview of Projects and Responsibilities

The diverse array of projects entrusted to me during my internship served as an immersive expedition through the intricate landscape of the data science lifecycle. My responsibilities spanned the entire spectrum, from the initial stages of data extraction and indexing to the final phases of developing robust models and deploying them in real-world scenarios. The projects that I worked made me integrate various techstacks, one such project that I worked on made me familiar with how online AI interviews worked. I developed a FastAPI prototype where as a candidate you need to enter your basic details like Role you are applying for, your current level of work experience, later you can either choose from default roles Job description that we create via GPT or upload your Job Description file,and finally upload your resume and get your online AI interview questions.

In other project, I developed a Streamlit interface for users to input Natural Language queries, converting them into SQL queries using a combination of queries stored in ChromaDB and GPT. The system refines its accuracy by seeking user feedback, storing accepted queries in a MySQL database and ChromaDB, and utilizing ChromaDB for future query suggestions. A user-friendly loop allows corrections for rejected queries, fostering continuous improvement in the system's Natural Language understanding and SQL query generation. The SQL query generated would be used to output the data from MySQL database of the client.

With the rise of online AI interviews in the market, both candidates and interviewers often seek ways to assess the alignment between job descriptions and candidate resumes. In this project, a sophisticated system extracts essential information from resumes using specific prompts. Simultaneously, job descriptions undergo GPT processing, generating details in JSON format. The system then employs Hugging Face sentence transformer embeddings and OpenAI embedding with cosine similarity to calculate a nuanced compatibility score. This innovative approach streamlines the evaluation process, offering an effective tool for recruiters and candidates to ensure a harmonious match between skills and job requirements.

Apart from this mainstream projects I have also worked on other projects that required me to fine-tune GPT models, use OpenAI's other functional capabilities. I have made applications that can answer questions based on the uploaded PDF. One such project required me to deal with Llamaindex and Facebook AI Similarity Search (FAISS) index. Other than it, I have contributed effectively developing mini projects which were FastAPI or streamlit based.

Beyond Code to Blogs

My internship became truly memorable as I ventured beyond coding, immersing myself in blog creation for FutureSmart AI. One notable project involved harnessing OpenAI's speech-to-text and text-to-speech capabilities to craft a conversational chatbot on the Streamlit platform, showcasing not just technical prowess but also a keen understanding of human-computer interactions. Further elevating the experience, I explored the world of video content by implementing Videodb—a video database akin to Chromadb but tailored for videos. These blog-worthy endeavors not only added depth to my internship but also highlighted the diverse applications of AI technologies, shaping an enriching and multifaceted learning journey.

A step by step overview of working on Project

Data Extraction and Indexing:
The early stages of my internship involved working closely with raw, unstructured data. This phase required not only technical finesse in extracting relevant information but also an understanding of the importance of data quality. Working with PDFs, DOCS, Excel and other files, and extracting data from it and then later indexing them to create embeddings was one of the initial tasks I dealt with.

To mention, I didn't had much idea about embeddings and found the below videos helpful in understanding and dealing with vector embeddings which helped me a lot during my entire data science internship.

Model Development:
As the internship progressed, I delved into the heart of data science—model development. This encompassed crafting machine learning models, employing statistical techniques, and iteratively refining algorithms to derive meaningful patterns from the data. The hands-on experience in model development not only enhanced my coding skills but also honed my ability to select and fine-tune models based on the specific nuances of each project.

To get help with fine tuning, I would recommend watching below videos:

Deployment in Real-world Scenarios:

One of the most gratifying aspects of my internship was witnessing the translation of models from development environments to real-world applications. While I initially had no experience with the deployment of models to cloud platforms like AWS, this stage became an immersive learning curve that broadened my skill set and provided invaluable insights into the practical challenges of deploying data science solutions in a production environment. Taking help from seniors and colleagues could always be an option while not being dependent entirely on others was a learning experience and challenge.

To get acquainted with AWS environment and ec2 instance on AWS and deploying applications, I would say below video would help you for sure.

Challenges Faced and How They Were Overcome

In the face of complex datasets and intricate problem statements, I encountered challenges that required creative problem-solving and adaptive thinking. One such challenge I faced was dealing with feedback mechanism on streamlit where I was needed to integrate "Accept" and "Reject" logic on generated answer. AI can not always solve your problem as its based on previous data and data science is a field which keeps updating everyday. I had to go through numerous discussions on Stackoverflow, streamlit discussions page, Documentations, and Youtube videos but couldn't find 100% matching solution to my problem. This helped me develop the ability to brainstorm problems and keeping patience. For a matter of fact, if I would have asked seniors for the solution, I would have required relatively lower time to solve the problem, but keeping myself intact to the problem helped me gain confidence in my problem solving skills.

Overcoming these challenges not only strengthened my problem-solving abilities but also fostered resilience in the face of ambiguity—a vital skill in the dynamic field of data science.

Tools and techstack

Python

Python was like the superhero of my toolkit. It's a coding language that made it super easy to play around with data and build smart models. With libraries like Pandas, Numpy, NLTK and Scikit-learn, I could do all sorts of data magic without pulling my hair out.

OpenAI

OpenAI has been the backbone of all our projects. Leveraging GPT models to fine tune them and using with custom prompts to get the desired models output. We have used OpenAI models to generate SQL queries from Natural Language queries using table descriptions and custom input prompts and examples. Additionally, I have found its purpose dealing with other projects which includes extracting and analysing resumes, matching docs, and building Retrieval Augmented Generation (RAG) models.

ML libraries

In my data science journey, I harnessed a set of powerful machine learning libraries like Numpy, Pandas, NLTK, Scikit learn, OpenAI, etc. Numpy acted as a math wizard, simplifying complex calculations and laying the groundwork for numerical magic. Pandas emerged as a data storyteller, transforming raw datasets into coherent narratives with its organizational prowess. Scikit-learn became my model-building companion, empowering me to predict outcomes and classify data effortlessly. NLTK served as a language decoder, unraveling the secrets of textual data with its linguistic algorithms.

Langchain

I have used langchain extensively to build conversational chatbot with Documents extraction and analysis. Langchain Agents like SQL agent found its application in some of our projects where we were required to deal with MySQL database. For those who are unaware of Langchain agents, I would recommend exploring Langchain Docs. In simple terms, agents interacts with the backend source to carry out a certain task just like humans. They can analyse the response and retry to build the response again if they found it unsatisfactory.

Streamlit

Streamlit is an interactive interface for development. I have used it for multiple application development. It can provide a chat interface to test the models we have created.

Hugging Face and Sentence Transformers

I have used Hugging face to embed documents and perform similarity search operations. Additionally, cosine similarity function was used extensively to match similarity between two documents. Huggingface also provided an open access to many models like Llama, Bert, etc.

Chromadb

Chromadb is a vector store database used to store documents once they are indexed. Using Sentence Transformer model to embed the document and then Chromadb to store them was widely used in number of projects. I have also used Chromadb to build custom PDF chatbots and feedback mechanism for several apps where Accepted GPT response along with user input query would be stored in Chromadb to later use them as example prompts to pass to GPT.

Llamaindex

Llamaindex was used for indexing documents and query retrieval from documents. I have used various data types in Llamaindex Tree Index, Keyword Table Index, and Vector Store index to effectively increase the efficiency of model to perform query search and retrieval.

FastAPI

FastAPI was the key component we were using to share the access of the application with clients. FastAPI provided endpoints which can be linked to the frontend part of the website and use the model we have created in backend to perform the tasks mentioned.

Amazon Web Services (AWS)

AWS was extensively used to deploy the streamlit applications and FastAPI we created so as to provide its global access. It provided the foundational infrastructure and necessary resources to ensure the deployment and functionality of the projects.

Additionally if you want to know the entire techstack that FutureSmart use you can watch the video:

Professional Development

For stepping in the corporate world, it's highly important that you get acquainted with it. My journey at FutureSmart AI gave me valuable lessons that I need to be careful about with respect to my future career. Getting mentorship from seniors have always been an important aspect of my internship at FutureSmart AI. From being through college casual ness to behaving professionally in team and carrying out tasks in Agile manner is one of the most important learning I would carry forward in my professional career. Building connections with fellow colleagues and seniors who have worked in top MNCs have resulted in a strong professional network for me.

Soft Skills

Interning at FutureSmart AI has not only impacted my technical skill set but also contributed to my all-round development.

Effective Communication

Communication emerged as a cornerstone of my soft skills repertoire during the internship. Regular interactions with team members, and seniors refined my ability to convey ideas with clarity and precision.

Adaptability and Resilience

My internship was like a rollercoaster with lots of changes. I got good at adapting quickly—whether the project changed, the team structure shifted, or we switched technologies. Learning to go with the flow not only made me a better problem-solver but also helped me stay positive when facing unexpected challenges.

Time Management Proficiency

With lots of tasks and deadlines, I had to get good at managing my time. Figuring out what to do first, making sure things were done on time, and balancing work along with my academics was challenging at first, but with patience and passing time I figured out the solution to balance everything.

Team Collaboration

Being part of a team with different skills taught me the importance of teamwork. I learned how to work smoothly with people who had different ways of doing things. This teamwork not only made our projects better but also made the workplace a positive and supportive space.

Problem-Solving Acumen

Real-world projects often unveil unexpected challenges, and my internship was a crucible for developing robust problem-solving skills. Confronting issues with a systematic and analytical mindset, I navigated through technical glitches and devised efficient solutions. This problem-solving acumen proved indispensable in troubleshooting and ensuring the success of projects.

Conclusion

My internship journey in data science was like a thrilling adventure filled with learning and discovery. I not only became really good at using cool tools and doing tech stuff, but I also learned how to talk with people, handle changes, manage time, work in a team, and solve problems in a clever way. It's like I got a whole bag of skills, not just for data science but for being awesome at work.

Looking back, I feel super proud of what I learned and the cool projects I worked on. This internship was like a big stepping stone for my future in data science. I am truly grateful for the guidance and mentorship provided by FutureSmart AI throughout all the projects. I can't wait to use all these skills in new adventures and keep learning more because the world of data is always changing, and I want to be right there, making a difference with my data skills!

Acknowledgement

I want to give a big shoutout to the amazing people at FutureSmart AI who made my data science internship unforgettable. Huge thanks to my mentors and colleagues for being super supportive and helping me grow. You all are like the real MVPs, guiding me through challenges. Special mention to Pradip Nichite sir for being my mentor and guiding me wherever it was needed during my entire internship.

Here's to more data adventures ahead!

Stay Connected with FutureSmart AI for the Latest in AI Insights -FutureSmart AI

Eager to stay informed about the cutting-edge advancements and captivating insights in the field of AI? Explore AI Demos, your ultimate destination for staying abreast of the newest AI tools and applications. AI Demos serves as your premier resource for education and inspiration. Immerse yourself in the future of AI today by visiting aidemos.com.