Project
GDSC2-Biomarker
Media

What is the project
I used the GDSC2 dataset to analyze cancer cell-line behavior and mutation-linked response patterns, with the goal of building a practical AI research assistant workflow.
- I worked with the Genomics of Drug Sensitivity in Cancer (GDSC2) dataset to explore cell-line level patterns relevant to biomarker discovery.
- The core focus was connecting mutation and molecular profile information with response signals to identify potentially meaningful gene-response relationships.
- I built an analysis pipeline that organized and compared cell-line data so I could quickly surface candidate trends for further biological investigation.
- Instead of treating this as only a one-off analysis, I framed it like an AI research assistant workflow that helps prioritize hypotheses before deeper validation.
- The project combined data handling, exploratory analysis, and interpretation to support more structured decision-making in early-stage research.
What I learned
- Real biomedical datasets are noisy and heterogeneous, so careful preprocessing and feature alignment matter as much as model choice.
- Cell-line analyses become more useful when mutation context is tied directly to response metrics, not analyzed in isolation.
- Research-assistant style tooling is valuable because it speeds up hypothesis generation while keeping the human researcher in control of interpretation.
- Balancing technical rigor with clarity is critical when communicating findings that are intended to inform scientific next steps.
- Even lightweight AI-assisted analysis can create practical value when it improves prioritization and reduces manual exploration time.
Skills Learned
- Biomedical Data Analysis
- Feature Engineering for Omics Data
- Mutation-Response Pattern Analysis
- Data Cleaning and Preprocessing
- Exploratory Data Analysis
- AI-Assisted Research Workflow Design
- Technical Communication for Research