I've found that retrying the same codegen prompt can result in a clean diff application. For fully automated tasks, such as PR creation, it's worth retrying until we get a clean diff. I feel this warrants a new flag to have all or nothing code applications.