Text Models
One of the biggest factors into why I love TaskPaper
is the way that it effortlessly converts to and from text files. For my TaskManager
app, I consider that to be must have functionality as well.
We’ll be talking about String
parsing as we break down converting back and forth from a hypothetical TaskPaper
project with multiple tasks. Some of these tasks may have a variety of tags. As well as the possibility of notes.
All of the code for this blog post is in this sample code repo.
Our sample TaskPaper file
Again the three major components to a TaskPaper
are:
-
Project
A
Project Title:
(Any text that ends with a colon). -
Task
A
- Task Name
(Any text that begins with a dash). -
Text
Any other
text
. -
Tags
Optionally, there can be tags.
@tag
or@tag(with content)
. Typically, these are usually used with tasks, but obviously, they can be on any of these items.
Let’s put that all together in a text file with a certain level of complexity, so that we can be sure that we’ve caught some of our edge cases:
Project:
Notes on a project.
TaskPaper uses a hierarchical structure to keep track of parents and
ownership.
ie; all of the below items are within this project.
- Item for that project.
- Child item with a note.
Note about that item.
- Tagged Item @tag
- Tagged Item where the tag has a payload @tag(payload)
Let's illustrate some of the time formats that we've worked so hard on.
- Something completed @test @due(2024-11-23) @done(2024-11-23)
- Something with a set due date and time @test @due(2024-11-24 10:00)
- An appointment @test @due(2024-11-24 10:00-10:30)
- A spanning appointment @test @due(2024-11-23 11:00 thru 2024-11-24 10:00)
Other Project:
- Item for the other project
For this blog post, the code is a little more complex than it needs to be so that I can illustrate the steps a little better.
Our New Models
First, we have our tags. At their simplest, there are either @tag
or @tag(payload)
.
enum Tag {
case tag(String)
case payloadTag(String, String)
}
Then we have our TaskPaper
types:
enum TPType {
case project(String)
case task(String)
case text(String)
}
And then lastly, we can create a model with both of our new enums:
struct TMType {
let tabLevel: Int
let type: TPType
let tags: [Tag]
let children: [TMType]
}
Gotchas
One of the first gotchas comes from how we might be parsing things.
If we’ll be updating our models, as we’re parsing it, tags
and children
may not all be known as the record is being parsed, so some mutating functions might be a little easier to work with:
struct TMType {
let tabLevel: Int
let type: TPType
private(set) var tags: [Tag]
private(set) var children: [TMType]
// MARK: Update Functionality
mutating func append(tag: Tag) {
tags.append(tag)
}
mutating func append(child: TMType) {
children.append(child)
}
}
Regex Builder
Regex is a powerful tool for parsing strings with known formats.
You may be familiar with something like this /\d{3}\D?\d{3}\D?\d{4}/
in terms of a phone number validation regex.
In Swift 5.7, RegexBuilder was added, which is a bit different. There are plenty of blog posts about how RegexBuilder works, so I’ll just talk about my use of it.
// Set up references
let indentRef = Reference(Substring.self)
let hyphenRef = Reference(Substring.self)
let colonRef = Reference(Substring.self)
let textRef = Reference(Substring.self)
// Set up the regex
let tmTypeSearch = Regex {
// Initial Indent
Capture(as: indentRef) {
ZeroOrMore(.whitespace)
}
// Whether or not it's a task
Capture(as: hyphenRef) {
Optionally {
"-"
}
}
ZeroOrMore(.whitespace)
// Text of the element (including tags)
Capture(as: textRef) {
OneOrMore(
ChoiceOf {
// All of the valid characters for the text area.
CharacterClass(.anyNonNewline)
}, .reluctant
)
}
// Whether it's a project or not
Capture(as: colonRef) {
Optionally {
":"
}
}
}
This breaks a string into several elements that the regex can be captured and then parsed.
If there’s white space at the front, that can be parsed to figure out the indentation level.
If there’s a hyphen, that can tell us that this should become a task
.
If there’s a colon, that can tell us that this should become a project
.
Whatever text is there, becomes the basis of the text for the model.
Gotchas
We’ll have to do a bit of additional processing to handle the tags. Otherwise - Task
, - Task @tag
, - Task @tag(payload)
would not have the identical TPType
.
Happily, regex can help with that, too.
// Set up references
let textRef = Reference(Substring.self)
let payloadRef = Reference(Substring.self)
// Set up the regexes
// ie; @<tag>(<payload>)
let payloadSearch = Regex {
"@"
Capture(as: textRef) {
OneOrMore(.word)
}
"("
Capture(as: payloadRef) {
OneOrMore(
ChoiceOf {
CharacterClass(.word)
CharacterClass(.whitespace)
CharacterClass(.digit)
"-"
":"
}
)
}
")"
ZeroOrMore(.whitespace)
}
// ie; @<tag>
let tagSearch = Regex {
"@"
Capture(as: textRef) {
OneOrMore(.word)
}
ZeroOrMore(.whitespace)
}
The parsing of this is a little more complicated, as we have two different regex to check against. (I tried to make a unified regex that would handle both @tag
and @tag(payload)
, but I couldn’t figure out how to manage it and capture the payload properly, so I decided to just go with separate regex parsers, parse with each, and then replace Tag.tag
models with the proper Tag.payloadTag
models, if the exist.
Using our regex parsers
We expect whole matches for the tmTypeSearch
. If the whole record doesn’t match, it’s not a valid model.
The tagSearch
and payloadSearch
parses, we want to get all the matches within the string and turn them into arrays of our Tag
model.
We also needed another pass for text with tags so that we can get the text up until the first @
to normalize the text.
let normalizeText = Regex {
Capture(as: textRef) {
OneOrMore(.anyNonNewline, .reluctant)
}
ZeroOrMore(.whitespace)
// We know there's a tag.
"@"
// And anything after to fill out the line.
ZeroOrMore(.anyNonNewline)
}
It can’t be this easy, right?
Parsing a String
into a TMType
model was a necessary first step, but we also need to be able to turn an array of String
objects into a normalized TMType
array with populated children
within the TMType
models.
When the tabLevel is higher, it should become the child of the last child of the current TMType
model. If the tabLevel is within the current TMType
model, it should become a child at the appropriate level. Otherwise, we should append it to the list of TMType
models, this becomes the new current TMType
model, and continue parsing.
Gotchas
Some useful recursive helper functions made this a lot easier:
-
func lastChild() -> TMType?
A method that will walk the model to find the last child or child’s child.
-
func lastChild(with tabLevel: Int) -> TMType?
A method that will walk the model to find the last child or child’s child with a specific indentation level.
-
func parent(of child: TMType) -> TMType?
A method that will walk the models to find the parent of a specific model.
-
@discardableResult
mutating func replace(child: TMType, with updatedChild: TMType) -> Bool
A method to find a child within the
children
array and replace that object with the updated version.
With these, we can now dig through the various children to find the appropriate places to insert models at the proper level and then replace things up the chain.
From Models Back to Strings
Converting from a TMType
model to a String
was much easier.
I made life easier for myself by setting up a simple protocol that each level of the TMType
model could conform to.
protocol FormattedTMType {
var toString: String { get }
}
Then I could stitch the pieces together appropriately:
Tag models
var toString: String {
switch self {
case .tag(let tag):
return "@\(tag)"
case .payloadTag(let tag, let payload):
return "@\(tag)(\(payload))"
}
}
TPType models
var toString: String {
switch self {
case .project(let string):
return "\(string):"
case .task(let string):
return "- \(string)"
case .text(let string):
return string
}
}
TMType models
var toString: String {
// Indicates the indentation level
var result = String(repeating: "\t", count: tabLevel)
// The `TPType` from above
result += type.toString
// The `Tag` from above, if present.
tags.forEach { tag in
if !result.isEmpty {
result += " "
}
result += tag.toString
}
// The children, if present.
guard !children.isEmpty else { return result }
children.forEach { child in
if !result.isEmpty {
result += "\n"
}
result += child.toString
}
return result
}
Testing
Testing here was especially important, so that the parsers could be verified both for the expected simple path and some of the more complicated edge cases.
We also needed to make sure that the tag payloads for our date models were parsing properly and we’re able to convert from a Tag.payloadTag
to the TMDateType
models properly, so that we can better use these with our EKManager
EventKit
wrapper.
Payload Parsing
Parsing the different date formats can be looks like it might difficult. Here’s a list of the formats, from our previous blog entries:
2024-12-01
2024-12-01 12:01
-
2024-12-01 12:01-13:00
which could also be2024-12-01 12:01 thru 13:00
-
2024-12-01 12:01-2024-12-31 12:31
which could also be2024-12-01 12:01 thru 2024-12-31 12:31
Gotchas
One solution might be to try to build a unified regex to handle all possible cases and capture the appropriate variables.
Another solution may be a cleaner more brute force solution that I went with:
I built a series of regex matches for each of the formats above. Then I just needed to figure out the possible match to check based upon the string length.
/// Helper Enum for Formatting...
enum DateParameterFormat: Equatable {
case date
case dateTime
case dateTimeEndTime
case dateTimeDateTime
init?(from string: String) {
switch string.count {
case 10:
// YYYY-mm-dd
self = .date
case 16:
// YYYY-mm-dd HH:mm
self = .dateTime
case 22...32:
// YYYY-mm-dd HH:mm-HH:mm
// ...
// YYYY-mm-dd HH:mm through HH:mm
self = .dateTimeEndTime
case 33...41:
// YYYY-mm-dd HH:mm-YYYY-mm-dd HH:mm
// ...
// YYYY-mm-dd HH:mm through YYYY-mm-dd HH:mm
self = .dateTimeDateTime
default:
return nil
}
}
}
Once that was done, the logic to compare the appropriate regex was equally clear. Check against the appropriate regex pattern, extract the captive values, plug them into our previously created date parameter structures. We can then extract the dates into our more generic TMDateType
. My choice of doing that is to ensure that our validation logic is being called and the dates are valid.
func toTMDateType() throws -> TMDateType? {
guard let format = TMDateType.DateParameterFormat(from: self) else { return nil }
...
// Check the possible format against our regex patterns...
switch format {
case .date:
// yyyy-MM-dd
guard let match = self.wholeMatch(of: dateMatch) else {
return nil
}
let year = match[yearRef]
let month = match[monthRef]
let day = match[dayRef]
return try DateParameters(year: year, month: month, day: day).toDateType()
case .dateTime:
// yyyy-MM-dd HH:mm
guard let match = self.wholeMatch(of: dateTimeMatch) else {
return nil
}
let year = match[yearRef]
let month = match[monthRef]
let day = match[dayRef]
let hour = match[hourRef]
let minute = match[minuteRef]
return try DateTimeParameters(date: .init(year: year, month: month, day: day),
time: .init(hour: hour, minute: minute)).toDateType()
case .dateTimeEndTime:
// yyyy-MM-dd HH:mm-HH:mm
guard let match = self.wholeMatch(of: dateTimeEndTimeMatch) else {
return nil
}
let year = match[yearRef]
let month = match[monthRef]
let day = match[dayRef]
let hour = match[hourRef]
let minute = match[minuteRef]
let secondHour = match[secondHourRef]
let secondMinute = match[secondMinuteRef]
return try DateTimeEndTimeParameters(date: .init(year: year, month: month, day: day),
time: .init(hour: hour, minute: minute),
endTime: .init(hour: secondHour, minute: secondMinute)).toDateType()
case .dateTimeDateTime:
// yyyy-MM-dd HH:mm thru yyyy-MM-dd HH:mm
guard let match = self.wholeMatch(of: dateTimeDateTimeMatch) else {
return nil
}
let year = match[yearRef]
let month = match[monthRef]
let day = match[dayRef]
let hour = match[hourRef]
let minute = match[minuteRef]
let secondYear = match[secondYearRef]
let secondMonth = match[secondMonthRef]
let secondDay = match[secondDayRef]
let secondHour = match[secondHourRef]
let secondMinute = match[secondMinuteRef]
return try DateTimeDateTimeParameters(start: .init(date: .init(year: year, month: month, day: day),
time: .init(hour: hour, minute: minute)),
end: .init(date: .init(year: secondYear, month: secondMonth, day: secondDay),
time: .init(hour: secondHour, minute: secondMinute))).toDateType()
}
}
Converting from formatted date strings to TMDateType
will give us start and end dates, if appropriate.
Then, we just need to be able to convert from TMDateType
back to formatted date strings. So I added a separator type enum and we can leverage the date parameter format we used above for this:
func toFormattedDateString(format: DateParameterFormat,
separator: DateFormatSeparatorType = .compact,
withSpaces: Bool = true) -> String? {
...
}
This allows us to ensure that the formats are valid by using business logic to verify things such as the .date
or .dateTime
are only valid if there’s no endDate
property of the TMDateType
. Likewise, .dateTimeEndTime
is only valid if the endDate
is within the same calendar day as the startDate
property.
Testing
Unit testing allowed me to verify edge cases in my format parsing and helped me make sure that all of the variants worked properly.
The unit tests are repetitive but thorough.
Next week, we start working on the UI layer and getting into the SwiftUI of it all.