Skip to content

Fix: add numbered suffix to duplicate filenames instead of overwriting them#37

Open
chicheese wants to merge 4 commits intoJC3:masterfrom
chicheese:master
Open

Fix: add numbered suffix to duplicate filenames instead of overwriting them#37
chicheese wants to merge 4 commits intoJC3:masterfrom
chicheese:master

Conversation

@chicheese
Copy link
Copy Markdown

If a HAR file has multiple entries that share the same path and filename, the zip was just silently overwriting them and you would end up with only one file instead of all of them.

This adds handling so that when a filename collision is detected, instead of overwriting the existing file it appends a numbered suffix to the new one. So if you have 15 files all named file.ext you would get file.ext, file_2.ext, file_3.ext and so on for however many there are.

The way it works is it checks if the filename already exists in the zip, and if it does it tries file_2.ext, then file_3.ext, and keeps incrementing until it finds a name that isnt taken yet, then writes to that. The filename and extension are split around the last dot after the last slash so that dots in directory names do NOT cause any issues. Files with and without extensions both work fine.

Credit to xjcb-de who originally started looking into this problem on their fork in October of 2024.

xjcb and others added 4 commits October 13, 2024 18:59
Appending a numbered suffix on a file's name if that there are multiple files with the same name.
…ed once instead of looping

The duplicate filename handling in buildZIP() only tried to rename a file once when it detected a collision. It would generate a candidate name like file_2.ext using a regex on the original filepath string, but it never checked if that candidate name was already taken in the zip before writing to it.

This caused a bug where if 15+ files in the HAR shared the same path and name, only 2 files would end up in the zip: the original (file.ext) and one renamed copy (file_2.ext). Every file after that would just overwrite file_2.ext because the original filepath string never changed between loop iterations, so the regex always came up with the same _2 candidate.

The fix was to replace the one-shot rename logic with a while loop that starts at counter 2 and keeps incrementing until it finds a candidate path that doesnt already exist in the zip. The filename and extension are split around the last dot (after the last slash, so dots in directory names dont cause issues). Files with and without extensions both work. Duplicates now each get their own file: file.ext, file_2.ext, file_3.ext, and so on for however many copies are in the HAR.
Merge patch-1 into master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant