Student Research Lagniappe
11:30 AM – 1:30 PM | PFT 1246
Understanding Robustness of Model Editing in Code LLMs
Abstract
Large language models (LLMs) are increasingly used in software development to assist with tasks such as code generation, refactoring, and bug fixing. However, while LLMs remain static after pretraining, programming languages and APIs continue to evolve, leading to the generation of deprecated or incompatible code that undermines reliability. Retraining LLMs from scratch to reflect such changes is computationally expensive, making model editing a promising lightweight alternative that updates only a small subset of parameters. Despite its potential, it remains unclear whether model editing yields genuine syntactic and semantic adaptations or merely superficial fixes. In this work, we present a systematic study of five state-of-the-art model editing methods: Constrained Fine-Tuning (FT), GRACE, MEMIT, PMET, and ROME. We apply these methods to three leading open-source code LLMs, Code Llama, CodeQwen1.5, and DeepSeek Coder, under controlled API deprecation scenarios. Our evaluation covers both instant and sequential editing settings, using three disjoint evaluation sets designed to assess reliability, generalization, and specificity. We measure model correctness at three levels: successful compilation, partial test case pass, and full test pass. Our findings show that instant edits consistently degrade model performance, with syntactic validity dropping by up to 86 percentage points and functional correctness declining by 45 points even in the best-performing setting. Sequential edits further amplify this degradation, and in some cases, model performance collapses entirely. Across all models, most passing generations relied on workarounds rather than correctly adopting the intended changes, while faulty adoptions that result in test failures or compilation errors were significantly more frequent. Correct adoptions, where the model correctly integrates the intended change, occurred in only about 6% of cases. We observe recurring failure modes, including syntax errors, compilation failures, unstable behavior, and regressions on unrelated tasks. These results indicate that current editing methods offer only limited and unstable adaptation in code LLMs. Our study underscores the need for editing techniques specifically designed for code LLMs to ensure correctness, robustness, and maintainability as software ecosystems evolve.
Vinaik Chhetri
Lousiana State University
Towards Mobile Learning: The Tradeoffs of Practicing via Fill-in-the-Blank vs Traditional Programming Problems
Abstract
Many people use smartphones more than traditional computers, including for serious tasks like paying bills, learning new skills, and completing homework. Responding to this reality, learn-to-code platforms support mobile phones. Because of mobile phones' small screen size, they have thus far been limited to fill-in-the-blank-style coding practice problems. However, with the advent of folding screens, which are much larger, mobile phones can now support a traditional code editor, raising the question, which modality is best for learning to code on the go? As a first step towards answering this question, we conducted a case study where we trained 50 previously inexperienced participants how to code using a short instructional video. After the video, 25 participants practiced coding with fill-in-the-blank problems and 25 with problems in a traditional editor. Then we assessed their performance with a post test and their experience with a questionnaire. Our findings suggest that participants perform better and feel more prepared after practicing with a traditional editor. However, our analysis suggests that fill-in-the-blank questions also have advantages, as they are adept at introducing structure and are quicker to solve. We envision that a mobile learning platform that combines both types of practice problem types would be most effective.
Linta Islam
Lousiana State University
Seeing the Evidence: A Forensic Examination of Meta’s AI Smart Glasses
Abstract
Ray-Ban smart glasses with Meta AI are selling fast, but investigators don’t yet have clear guidance or tools to examine them. We built a step-by-step process to collect and analyze evidence from the glasses, the Android companion app, and matching cloud data. In testing, we were able to pull account and device details, photos/videos with metadata, location traces, and even records of AI interactions. We also added a new open-source module to ALEAPP so other examiners can parse these artifacts automatically. Finally, we ran a controlled case study of an unauthorized photo/video capture and showed how the data can confirm what happened and when. Our work gives investigators a practical starting point for this new kind of wearable evidence.


