Skip to content

Handling of multi-line text in extracts and CSV file needs cleanup #6

@kevinburleigh75

Description

@kevinburleigh75

In a previous commit, processFile.rb was peppered with #gsub calls to convert newlines in extracted text to spaces. The possibility of newlines in the extracts has two implications:

  • regexps processing the text need to be aware that multi-line patterns might be needed
  • the output comma-separated value (CSV) file is ill-formed for certain uses

An example of a multi-line extract is the xref field of the second grant extracted from ipg140107 (see lines 2509-2530 of ipg140107.extract), though others are likely to exist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions