We can think about how to evaluate the code quality of the code that an LLM generates. How changes might improve or not.