+1-646-906-7909    ez806@nyu.edu

Erchi(Archer) Zhang

  • GitHub Page
  • LinkedIn Profile
  • Software Engineer CV
  • Data Scientist CV
  • Machine Learning Engineer CV

Hi there — this is Erchi Zhang, and you can call me Archer. I am a recent graduate with an M.S. in Data Science from New York University and a B.S. in Computer Science from Brandeis University. I have a strong foundation in programming languages like Python, Java, R, SQL, HTML/CSS, and JavaScript, and I have experience working with data structures as well as algorithms during coursework, academic projects, research papers, and internships. I am also an enthusiast in Machine Learning and Deep Learning. I am actively seeking 2025 new grad opportunities in data science, software development engineering, machine learning, data engineering, or any relevant field.

My Career

Kineviz Inc.

  • Incorporated LLMs (Large Language Models) into a chat GUI via Python and JavaScript
  • Developed a Cypher query parser with ANTLR in Python, incorporating Named Entity Recognition (NER) and vector search to validate and enhance Cypher queries, enabling advanced search capabilities in Neo4j graph databases
  • Debugged and improved Kineviz's graph visualization tool (GraphXR) using Azure virtual machines
  • Built a website leveraging FastAPI that employs LLMs to translate plain texts into user-defined JSON schemas
Jun. 2024 – Dec. 2024
San Francisco, CA (remote)
Data Analytics Intern

Liangyouyinli Technology Co., Ltd

  • Utilized PyTorch to implement convolutional neural networks (CNN) for stock price prediction in China’s stock market
  • Through backtesting, evaluated our models by checking whether the selected top five stocks made gains in a designated time frame (3 or 5 days), concluding that the models predict with an accuracy of approximately 75%
May. 2023 – Aug. 2023
Beijing, China
Data Scientist Intern

YUSUR Technology Co., Ltd

  • Utilized Apache Spark and implemented Shell and Python scripts to process the Iris dataset on distributed and parallel systems, benchmarking the performance of our self-developed Kernel Processing Unit (KPU)
  • Automated data pipelines in Python to preprocess and clean data using Prefect and Mage
  • Visualized our findings with D3.js, Matplotlib, and Seaborn, and made presentations using these data visualizations
May. 2021 – Aug. 2021
Beijing, China
Software Development Engineer Intern

State Key Laboratory of Software Development Environment, Beihang University

  • Queried datasets from SQL databases (MySQL, PostgreSQL) and NoSQL database (MongoDB)
  • Designed HTML, CSS, and JavaScript scripts to improve the functionality of webpages, developing test cases to ensure all corner cases have been covered
Dec. 2020 – Feb. 2021
Beijing, China
Software Development Engineer Intern

My Projects

PaperGist: A Cost-Efficient Cloud-Native Research Paper Summarization Platform

Spring 2025

  • Architected a serverless cloud-native platform using AWS Lambda for backend services, API Gateway for request routing, and EC2 G5 GPU instances for LLaMA-3.2 inference, with intelligent task queuing via SQS and EventBridge for automated GPU lifecycle management
  • Designed a robust data architecture using DynamoDB (with primary and secondary indexes) and S3 for efficient document storage and retrieval, implementing a content-based hashing system to eliminate redundant processing and optimize costs
  • Developed a full-stack web application with static frontend hosting on S3, featuring arXiv integration, real-time summarization status tracking, and manual document uploads, achieving average processing times of 15-20 seconds per paper when GPU is active

Deck-to-CPT: AI-Driven Reimbursement Code Discovery for HealthTech Start-Ups

Fall 2024

  • Built a Python-based web application with HTML/CSS for processing PDF pitch decks and returning relevant Current Procedural Terminology (CPT) codes using AI
  • Automated web scraping to retrieve detailed CPT code information from healthcare websites
  • Applied Named Entity Recognition (NER) to extract key information from PDFs and utilized Retrieval-Augmented Generation (RAG) for accurate CPT code recommendations

Fixplainer: Failure Explainer for Multiple Object Tracking (MOT)

Spring 2024

  • Created a GUI tool that can extract features and then generate various SHAP explanation plots for the objects in a multiple object tracking (MOT) task video frame, elucidating why the objects are successfully or unsuccessfully tracked
  • Applied YOLOv8 and BoT-SORT as object detection and object tracking tools on video datasets to create training sets

GraphBERT: Bridging Graph and Text for Malicious Behavior Detection on Social Media

Nov.2021 – Jun. 2022

  • Participated in designing a model that focuses on detecting malicious tweets and users using both semantic information encoded by transformers (i.e., BERT) and relational information encoded by graph neural networks (GNNs)
  • Preprocessed datasets obtained from the Internet, including dealing with wrong and incomplete data rows, labeling the data, and performing exploratory data analysis to ensure fair representation of an entire population

Fair Graph Representation Learning via Diverse Mixture-of-Experts

Jun. 2022 – Oct. 2022

  • Conducted in-depth research on the current state of GNN
  • Engaged in developing a plug-and-play method called G-FAME (Graph-Fairness Mixture of Experts) to assist the algorithm in learning discriminative representation with unbiased attributes
  • Proposed an advanced version of G-FAME named G-FAME++ with an introduction to three novel strategies to improve fairness representation from node representations, model layer, and parameter redundancy perspectives

My Skills