Update 2024-08-20-1837: There was a newer attempt at this, see 2024/08/20/1307 and 2024/08/21/1743.


I have a bunch of markdown files exported from Notion, I’d like to parse all of them. I want a bash-script for my Arch Linux computer that will process all the markdown files in a given directory.

The algorithm as follows.

The directory have Markdown files in its root, also it contains sub-directories. I do not want you to go recursively into sub-directories, scan the provided directory only, for markdown files only.

I want you to parse the file into a new file, located in the specified base directory that is specified inside the script itself, which should be created if it doesn’t exist.

I want to start the file with the front matter, that looks like this:

+++
title = 'File’s Header'
date = "2016-12-09T09:10:35+0200"
draft = false
tags = ['Evernote', 'no tags']
+++

File’s content.

The original file starts with the header, e.g. # File’s Header, I want that to become the title, as described above.

Then there is the empty line and the parameter lines block.

The first parameter line starts with Tags: . It may be or may not be present.

If it’s present, it contains of tags separated by a comma, if there are more than one tag. If there is just one tag, it goes like this: Tags: ideas, where ideas is the name of the tag, that word could be different. If there are spaces between words in tags, they should be treated as one tag. E.g. to watch or to read. Tags separated either by a comma or it’s the end of the line, which signifies there’s no more tags.

I’d like to add those tags to the front matter, following the 1st tag ‘Evernote’ that I have as a template.

E.g. if the Tags: line is:

Tags: to do, book, thinking, Напечатать!

then the front matter tags would be:

tags = ['Evernote', 'to do', 'book', 'thinking', 'Напечатать!']

If there’s no Tags: line in the file, then use this line for tags:

tags = ['Evernote', 'no tags']

Then goes the line that starts with the Created: , e.g. Created: September 19, 2011 1:05 AM, I want that info to go to the front matter as well:

date = "2011-09-19T01:05:00+0200"

If there’s the line that starts with the Updated: , then it’s date goes to front matter too. E.g. Updated: February 5, 2013 10:50 AM

lastmod = "2013-02-05T10:50:00+0200"

If there’s URL: line, I’d like it to be present in the front matter too, formatted in the similar fashion too, with TOML-syntax. E.g.

URL: https://news.ycombinator.com/item?id=22852316

becomes:

source = "https://news.ycombinator.com/item?id=22852316"

Notice that the URL: becomes source = , the url stays the same, but put in "" symbols.

If there’s parameters line that starts with Reminder: , e.g. Reminder: October 18, 2013 7:00 AM (GMT+3), then reformat it the same way so it would become:

reminder = "2013-10-18T07:00:00+0200"

Those parameter lines go one-after-another, so if there’s an empty line, then there’s no more parameters.

I want to copy the rest of the file untouched.

The new file should be placed at the location which is: YYYY/MM/DD/HHMM.md inside the base directory. E.g. 2024/08/01/1434.md, where the date should be taken from the Created: parameter that becomes date = parameter in the front matter. Also, I want the Created: and Updated: metadata to be in the HHMM.md file.

Also, I want you to temporarily store the original name of the file (excluding .md) and check whether there is a directory with the same name. If there is one, I want it to be moved to a new location, where the file is stored and be renamed as the file name too, without the extension.

E.g. if there’s a file file12314756rt2gyeuasdj.md and there is a directory named file12314756rt2gyeuasdj, and the markdown file has the Created: date as March 18, 2013 12:53 PM

Created: March 18, 2013 12:53 PM

Then I want the markdown file to be renamed as 1253.md, the directory to be renamed as 1253, and both to be placed inside the 2013/03/18 directory. Which should be created if it does not exist. Also, if there’s a file that’s named exactly like the one, you’re allowed to rename the new file as 1253-1.md, same is true to the new directory 1253.

If there’s a directory with the same name as the file, I want the new name to be reflected inside the file. Look for the name (could be multiple times, I want all of them to be replaced by the new file). Note, that the name of the file could be with Cyrillic symbols, e.g. %D0%92%D1%81%D0%B5%CC%88%20%D0%B4%D0%BB%D1%8F%20WarCraft%203%20d1a6a1692fbc494db266a11680472960.

You’re allowed to use temporary files if needed, they’re allowed to be stored in ${XDG_RUNTIME_DIR:-/tmp}.


Basic logic, again.

  1. Take a markdown file.
  2. The first line of the file becomes the title field of the new file. (This works now.)
  3. The 2nd line is always empty.
  4. The 3rd line can be Tags: , if it is, that becomes the new tags for front matter too. E.g. the line Tags: to do, Напечатать! would become tags = ['Evernote', 'to do', 'Напечатать!'], if the 3rd line is not Tags: , then use tags = [Evernote, no tags] for the front matter.
  5. The next line is always Created: string. All the text after the Created: is a date. Use it! 5.1. That date becomes a path for the new file, YYYY/MM/DD/HHMM.md, where it’s created with mkdir -p $BASE_DIR/YYYY/MM/DD 5.2. That date becomes the creation date of the file, use touch. 5.3. That date becomes date for front matter, e.g. date = "YYYY-MM-DD-THH-MM-SS+0300". You can take 00 for SS (seconds), so the result is date = "2024-08-01-T15-25-00+0300"
  6. The next line may be Updated: line, e.g. Updated: May 23, 2012 12:11 PM. If it is, use it for front matter too, with the word lastmod, e.g. lastmod = "2012-05-23T12:11:00+0300". But if there’s no Updated line, no need for the default option, simply skip this entry. Save the Updated: (lastmod) date for later, to be used with touch command.
  7. If there’s the Reminder: line, then do the same for it, e.g. reminder = "2015-05-25T15:00:00+0300". But if there’s no Reminder: line, skip it too, no need for the default date.
  8. Same for URL: line. If it’s not present, skip it. If it’s present, reformat it. E.g. a line URL: http://habrahabr.ru/post/129863/ becomes source = "http://habrahabr.ru/post/129863/".
  9. If there’s the empty line again, finish processing front matter. The rest of the file is the content that I want to copy to a new file.
  10. If there was updated field, then I want the file to be touched so the Modified date would be as the Updated/lastmod parameter.

Rewrite for New Account Attempt

Write me a bash-script that parses a markdown file. The script would be run on an Arch Linux computer.

The file starts with a header, e.g. # Header, then there’s an empty line, then meta-information.

E.g.

# File’s Header

Tags: to do, to print!, book
Created: December 12, 2016 9:10 AM
Updated: October 22, 2017 11:06 AM
Reminder: October 10, 2017 10:00 AM (GMT)

That’s my content.

I want that file to be reformatted as:



+++
title = 'File’s Header'
date = "2016-12-09T09:10:00+0200"
lastmod = "2017-10-22T11:06:00+0200"
reminder = "2017-10-10T12:00:00+0200"
draft = false
tags = ['Evernote', 'to do', 'to print!', 'book']
+++

That’s my content.

If there’s no meta-information for some lines, e.g. there’s no Tags or no Updated or no Reminder line, no need to use any default line in the output too.