Skip to content

Conversation

@cdhorn
Copy link

@cdhorn cdhorn commented May 24, 2020

Hi Nick,
I have made a number of changes I'm hoping you'll consider merging. It might have been better to try to implement each in a separate branch, I'm sort of new at this so I apologize. To try to summarize them at a high level:

  • I removed the FileElement as it is really a duplicate of ObjectElement, added SourceElement, RepositoryElement, NoteElement, HeaderElement, SubmitterElement and SubmissionElement.
  • I added a set of subparsers for all of the various substructures in the standard within the given record types.
  • Added a get_record() method to all record elements that parses and returns the full record as structured data in a dict format. A lot of this is logic I needed for something else I'm starting to toy with and it seemed to make sense to me to have it in the base parser.
  • Added a Reader class that gives a couple simple methods to fetch all the records by type or all of them in one shot.
  • Added records.py with types for the Reader.
  • Broke exceptions out into errors.py.
  • Some more updates to tags.py to add a few more and fix some bugs/typos.
  • Added standards.py with links to the 5.5, 5.5.1, 5.5.1 GEDCOM-L, and 5.5.5 standards and used those when raising exceptions when applicable.
  • Added detect.py to detect the file encoding and the GEDCOM version. This added a dependency on the chardet and ansel packages. It now opens and parses Ansel files although I am not 100% sure I handled it right. As the codec is set when file opened it is not opened in binary mode and I removed the encode utf-8-sig stuff elsewhere. Please review those changes carefully, I've never really worked with different codecs and character sets before.
  • Gedcom 5.5.5 has strict requirements around validating format and logical structure, so if it detects a 5.5.5 file it raises an exception as the standard requires although it probably can parse the format of them fine. You can remove this if you think it should not be done.
  • Added type hints to just about everything so they should not be needed in the doc strings.
  • Cleaned up many doc strings and expanded them in a few areas.
    Thanks,
    Chris

nickreynke and others added 28 commits March 22, 2020 15:54
…-commits

feat: added setup for conventional commits
Removed FileElement as duplicates role of ObjectElement. Added Source, Repository, Note, Header,
Submission, and Submitter elements to handle all the defined record types in the standard. Add
subparsers for all of the substructures defined in the standard. Added get_record method to all
record elements to parse the full record structure and return it as a dictionary. Before processing
file added encoding check to identify type. If Python Ansel module is installed use it so can now
decode Ansel Gedcoms. Verify encoding found matches encoding claimed. Get version number. Validate
if 5.5.5 and if so reject it as that standard requires as no strict 5.5.5 reader exists yet. Added
standards.py with references to the different standards and use where needed. Added Reader object to
wrap the Parser and provide get_records_by_type and get_all_records methods.
@ghost
Copy link

ghost commented May 24, 2020

There were the following issues with this Pull Request

  • Commit: 4cf4cba
    • ✖ message may not be empty
    • ✖ type may not be empty

You may need to change the commit messages to comply with the repository contributing guidelines.


🤖 This comment was generated by commitlint[bot]. Please report issues here.

Happy coding!

@nickreynke nickreynke changed the base branch from master to develop August 7, 2020 16:58
@nickreynke nickreynke added this to the 2.0.0 milestone Aug 7, 2020
@nickreynke nickreynke mentioned this pull request Aug 7, 2020
5 tasks
@nickreynke
Copy link
Owner

Hey @cdhorn, sorry for the late reply and thank you for your BIG pr! :D.

I just opened a new PR for this (#53) because of some merge conflicts.

Let's discuss further changes over there! I close this PR.

@nickreynke nickreynke closed this Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants