I hold a B.Eng. in Computer Science from Beijing Institute of Technology, Zhuhai (BITZH). My previous research focused on computer audition and speech signal processing.
I will join CYDOO, a robotics company, in an algorithm role, while independently exploring AI agents and AI system design. My trajectory converges toward the intersection of AI algorithms and AI systems — understanding not just what works, but why, and under what conditions.
Reproduced a published model reporting 86.82% accuracy. Discovered 63% speaker overlap between train and test sets — systemic data leakage inflating all reported results. After enforcing strict speaker independence, accuracy collapsed to ~35%, near chance. Root cause: mean-pooled handcrafted features (162-dim) destroyed temporal structure, making Conv1d slide on the feature-concatenation axis rather than time — the architecture was mathematically incapable of temporal modeling.
Rebuilt the full pipeline from data to evaluation. Switched to WavLM frame-level SSL features (768-dim × T, preserving the true time axis). Designed Prosody-guided Temporal Importance Pooling — using children's F0 and energy contours to weight frames by importance, replacing blind mean pooling. Conducted negative augmentation experiments, quantifying distribution shift via Fréchet Distance.
80.85% test accuracy under strict 6:2:2 speaker-independent protocol, +30pp over strict MFCC baseline (50.2%), +5pp over published C-BESD baseline (76%). Prosody Pooling is the dominant driver (+2.24pp); Adapter contributes only ~0.5pp. Augmentation confirmed FD ∝ 1/Accuracy — distribution shift quantitatively predicts performance degradation. Cross-language transfer (English↔Telugu) largely fails (19–28%), identifying language shift as a harder problem than acoustic shift.
Built an end-to-end retrieval-augmented generation system for enterprise document Q&A. Pipeline spans document chunking, embedding, vector retrieval, prompt assembly, and LLM generation. Extended with agent tool-calling capabilities — summarization, task extraction, and structured querying — demonstrating integration across embedding, retrieval, API orchestration, and LLM inference.
Currently converging from single-domain applications toward generalizable methods. The core question: how do we build AI systems that are both capable and reliably understood?