Peng Luo
Senior ML/LLM Engineer & MLOps Specialist
ShangHai China
Passionate Senior LLM Engineer with 7+ years experience developing production AI/LLM systems. Proven expertise in developing LLMs GenAI services, and building ML infrastructure.
Featured Projects
My latest work in LLM, RAG systems, and ML infrastructure
RAG ChatbotLive Demo
RAG chatbot with intent-based retrieval, optimized memory (60–70% token savings), and hybrid search
RAG chatbot built with FastAPI, featuring intent-based retrieval (skips vector search for small talk), optimized memory with relevance filtering (60–70% token savings vs sliding window), and hybrid search (vector + BM25 with RRF). Key aspects: BGE-M3 local embeddings (free), Redis embedding cache, LLM reranking (top 50 → top 5), multiple chunking strategies (fixed, semantic, recursive), and 335+ tests with 80%+ coverage.
4-bit QLoRA Post-Training
Cross-platform QLoRA fine-tuning for LLMs on NVIDIA GPU and Apple Silicon — SFT, DPO, domain adaptation
A cross-platform QLoRA framework for fine-tuning LLMs on consumer hardware. Supports NVIDIA GPU (4-bit quantization via bitsandbytes) and Apple Silicon (bf16 via Metal Performance Shaders) with automatic platform detection — zero config needed. Key aspects: 84% memory reduction via NF4 quantization (NVIDIA), native bf16 training on Apple Silicon, multiple fine-tuning techniques (SFT, Domain Adaptation, DPO), and a finance domain specialization.
Stock Analysis Multi-Agent SystemLive Demo
AI-powered stock analysis with LangGraph orchestration, 7 specialized agents, backtesting, and enterprise monitoring
A production-grade multi-agent system for stock market analysis powered by LangGraph orchestration. Features 7 specialized agents working together: Data Collection, Technical Analysis, Fundamental Analysis, Sentiment Analysis, Risk Assessment, Decision Making, and Report Generation. Key aspects: State-based workflow management with PostgreSQL persistence, resilience patterns (retry, circuit breaker, timeout protection), real-time monitoring with metrics and alerts, and Backtrader integration for strategy backtesting.
aiTerm - AI-First Terminal
Native desktop terminal with built-in AI assistance, context-aware commands, and intelligent suggestions
A macOS terminal application that integrates AI directly into your command-line workflow. Built with Tauri 2.0 for native performance, featuring xterm.js terminal emulation and a Rust backend with portable-pty for robust PTY management. Key aspects: Ring buffer context management (500 lines) with LLM-based summarization for old entries, streaming AI responses via SSE, secure API key storage in system keychain, and support for multiple LLM providers (OpenAI, Anthropic, GLM).
RAG Evaluation Pipeline
Airflow-based ML pipeline for evaluating RAG chatbot performance with retrieval, generation, and baseline comparison
An automated evaluation pipeline for RAG chatbots using Apache Airflow with CeleryExecutor. Evaluates retrieval quality (MRR, NDCG, HitRate) and generation quality (ROUGE, BLEU, BERTScore). Key aspects: Docker Compose orchestration, PostgreSQL for results storage, automated report generation (JSON/HTML), and baseline comparison with statistical significance testing.