Peng Luo

Senior ML/LLM Engineer & MLOps Specialist

ShangHai China

Passionate Senior LLM Engineer with 7+ years experience developing production AI/LLM systems. Proven expertise in developing LLMs GenAI services, and building ML infrastructure.

7+
Years Experience
10+
Projects Completed
5
Featured Projects

Featured Projects

My latest work in LLM, RAG systems, and ML infrastructure

View All Projects

RAG Chatbot
Live Demo

RAG chatbot with intent-based retrieval, optimized memory (60–70% token savings), and hybrid search

RAG chatbot built with FastAPI, featuring intent-based retrieval (skips vector search for small talk), optimized memory with relevance filtering (60–70% token savings vs sliding window), and hybrid search (vector + BM25 with RRF). Key aspects: BGE-M3 local embeddings (free), Redis embedding cache, LLM reranking (top 50 → top 5), multiple chunking strategies (fixed, semantic, recursive), and 335+ tests with 80%+ coverage.

RAG
FastAPI
Vector Search
LLM
Docker

4-bit QLoRA Post-Training

Cross-platform QLoRA fine-tuning for LLMs on NVIDIA GPU and Apple Silicon — SFT, DPO, domain adaptation

A cross-platform QLoRA framework for fine-tuning LLMs on consumer hardware. Supports NVIDIA GPU (4-bit quantization via bitsandbytes) and Apple Silicon (bf16 via Metal Performance Shaders) with automatic platform detection — zero config needed. Key aspects: 84% memory reduction via NF4 quantization (NVIDIA), native bf16 training on Apple Silicon, multiple fine-tuning techniques (SFT, Domain Adaptation, DPO), and a finance domain specialization.

LLM
PyTorch
QLoRA
Transformers
DPO
+2

Stock Analysis Multi-Agent System
Live Demo

AI-powered stock analysis with LangGraph orchestration, 7 specialized agents, backtesting, and enterprise monitoring

A production-grade multi-agent system for stock market analysis powered by LangGraph orchestration. Features 7 specialized agents working together: Data Collection, Technical Analysis, Fundamental Analysis, Sentiment Analysis, Risk Assessment, Decision Making, and Report Generation. Key aspects: State-based workflow management with PostgreSQL persistence, resilience patterns (retry, circuit breaker, timeout protection), real-time monitoring with metrics and alerts, and Backtrader integration for strategy backtesting.

Multi-Agent
LangGraph
Stock Analysis
FastAPI
Next.js

aiTerm - AI-First Terminal

Native desktop terminal with built-in AI assistance, context-aware commands, and intelligent suggestions

A macOS terminal application that integrates AI directly into your command-line workflow. Built with Tauri 2.0 for native performance, featuring xterm.js terminal emulation and a Rust backend with portable-pty for robust PTY management. Key aspects: Ring buffer context management (500 lines) with LLM-based summarization for old entries, streaming AI responses via SSE, secure API key storage in system keychain, and support for multiple LLM providers (OpenAI, Anthropic, GLM).

Terminal
AI
Tauri
Rust
Vue 3

RAG Evaluation Pipeline

Airflow-based ML pipeline for evaluating RAG chatbot performance with retrieval, generation, and baseline comparison

An automated evaluation pipeline for RAG chatbots using Apache Airflow with CeleryExecutor. Evaluates retrieval quality (MRR, NDCG, HitRate) and generation quality (ROUGE, BLEU, BERTScore). Key aspects: Docker Compose orchestration, PostgreSQL for results storage, automated report generation (JSON/HTML), and baseline comparison with statistical significance testing.

MLOps
Airflow
RAG
Evaluation
Docker

Get In Touch

I'm currently looking for new opportunities. Whether you have a question or just want to say hi, I'll try my best to get back to you!

Email

Luopengllpp@hotmail.com

Contact

LinkedIn

Connect on LinkedIn

Contact

GitHub

Check out my repositories

Contact