Conversation
4ndrelim
left a comment
There was a problem hiding this comment.
Left some comments and suggestions for your consideration, but overall lgtm.
| // non-backslash character so that \% (an escaped percent) is preserved. Pairs | ||
| // of backslashes (\\) before % are treated as a line-break followed by a real | ||
| // comment, matching LaTeX semantics. | ||
| var commentRegex = regexp.MustCompile(`(^|[^\\])((?:\\\\)*)%.*$`) |
There was a problem hiding this comment.
This should cover most cases. Think its fair not to expect multiple chained \ since it hardly serves practical purpose beyond 2 consecutive backslashes which represents newline. 3 consecutive followed by a % would denote newline followed by % which is hardly the intent.
But a more general solution that guards against malicious input is to count the number of backslashes preceding % and only treating the remaining as comment if the count is even. This logic is actually implemented in the xtramcp backend in LatexParser since it is not uncommon to encounter cases e.g.\\\{. https://github.com/PaperDebugger/xtramcp/blob/main/common/latex_parser.py#L780
That said, I personally feel it is reasonable enough and am also ok with this solution.
There was a problem hiding this comment.
Actually, on a separate note, do you know why are we removing comments before passing to the LLM? I don't know the specifics, but assume this is the case. In general, even if it is a user comment not meant to be displayed, it might serve as good contextual cues or auxiliary information. The output may be informed with user's thoughts in the form of comments.
Might it be because some comments might be too long / redundant? So this is an attempt to save tokens?
There was a problem hiding this comment.
@4ndrelim Wait actually this already handles 3 or more backslashes before the % as well. I've improved the test cases to cover 3 or more backslashes regardless.
There was a problem hiding this comment.
Oh, i did not realise it already covers. nice.
Previously:
%is used to remove comments, which interfered with literal\%in LaTeX documents.Now:
%properly. Tested for%,\%and\\%.