-
Notifications
You must be signed in to change notification settings - Fork 39
Refactor Formula Parsing Logic & Introduce Standalone FormulaParser utility
#363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There is no need to implement The function needs to be as well as documented as possible. Make sure you complete the documentation and also add demos with explanatory comments as examples. Finally, make sure you add BISTs for every style of the Wilkinson notation supported by the function. |
|
okay , working on it . |
FormulaParser Refactoring SummaryI have refactored Key Changes:1. Recursive Descent Parsing: Now supports advanced Wilkinson syntax:
2. Documentation: Added a comprehensive Texinfo block with detailed sections:
3. Verification: Added extensive BISTs covering:
4. Integration Updates:
Technical Improvements:
|
|
Can you add a few more complex notations and to make sure that recursiveness works as expected and doesn't have some hidden breaking point? |
|
You are absolutely right—the initial test suite was insufficient for verifying the robustness of the recursive parser, especially for complex nested interactions. I should have included these edge cases from the start to ensure stability. I have now expanded the test suite from 8 to 40 tests, specifically targeting "torture test" scenarios to guarantee the recursive logic holds up. New Complex Test Cases Include:
|
Please, stop copy pasting Claude output (or whatever LLM-based setup you are using as a coding assistant) as a response to this conversation.
Apart from all of the above being statistically meaningless (hence useless), where did you come up with all this, which neither MATLAB defined behavior, nor Wilkinson's original symbolic description. I don't mind contributors using LLM-based assistance tools, but I expect that they are fully responsible for verifying their code before opening a PR. |
This PR extracts the formula parsing logic currently duplicated within
ClassificationGAM.mandRegressionGAM.minto a new, centralized static utility class:FormulaParser.m.related to #266 .
The Problem:
Previously, both GAM classes implemented their own private
parseFormulamethods. This led to code duplication and a silent logic bug whereinteractions="all"usedfullfact(generating levels) instead offf2n(generating the correct binary design matrix). Additionally, the legacy logic could not handle standard Wilkinson notation features needed for future compatibility (like*expansion or intercept removal).The Solution:
I have introduced
FormulaParser.mas a robust, standalone utility. This refactor lays the "infrastructure rails" for future improvements to functions likefitlmandanovanwhile immediately stabilizing the GAMs.Key Changes:
FormulaParser.fullfactforff2ninparseInteractions, ensuring correct binary interaction matrices are generated."Cost ~ A + B").*operator (e.g.,A*Bexpands toA + B + A:B).-1).strfind(avoidingcontains) to ensure backward compatibility with older Octave versions.Verification:
FormulaParser.mcovering edge cases and syntax errors.RegressionGAMandClassificationGAMto align with the more robust error reporting.fitrgamandfitcgampass all tests without modification.Note:
This PR focuses strictly on infrastructure and metadata. It does not alter the mathematical backfitting algorithm of the GAMs, nor does it yet refactor
fitlm/anovan.Known Limitations & Future Work:
*operator (e.g.,A*B) are currently returned in binary counting order (B, A, A:B). While mathematically equivalent, future work could sort these to match standard main-effect priority (A, B, A:B).+,:,*,-). Complex nested groupings like(A + B):Care not yet supported and are intended for a future "Phase 2" update alongsidefitlmintegration.