STEMOS β Machine Learning Engineer & AI Engineering Intern
- Designed a robust pipeline to convert scanned PDFs into high-resolution PNGs and preprocessed them for OCR and CV tasks with performant batching and error handling.
- Developed a computer vision classifier to categorize documents (invoices, administrative letters, irrelevant) with >90% validation accuracy on a mixed dataset and implemented model calibration for class imbalance.
- Built specialized information extraction modules (template-aware & ML-based) to parse structured fields from invoices and letters; evaluated using precision/recall and integrated post-processing rules.
- Automated ingestion pipelines to push cleaned, validated records into a relational DB (Postgres), including idempotency and retry mechanisms.
- Implemented a Retrieval-Augmented Generation (RAG) prototype connecting vector search (FAISS) with a lightweight document re-ranker to support contextual search over administrative documents.