Soupault: First Lua Widget
Published on 2022-01-01.
To manage this little website I'm using the wonderful Soupault software. For me and my needs it hits the spot between "do everything myself" and using a more advanced tool like Hugo or Jekyll, since I can write the HTML myself and have Soupault put the content into a template.
In addition to that Soupault also allows for plugins (or widgets, as they are called). They are written in Lua and can be used for performing actions on the HTML code before it's written to a file (amongst other things). I see a lot of potential in this, and I wanted to share my experience in writing my first Soupault widget.
I highly recommend looking at the reference manual and the plugin section of the Soupault website. The resulting code is available on my sourcehut git repo.
Before the code
This widget should take the HTML code and look for any links (anchor, or "a", tags). If the link is pointing to another website the attributes
target="_blank" (to open in a new window or tab) should be added.
I think a reasonable plan could look something like this:
- Get all the links in the HTML code and loop through them
- Get the "href" attribute and check if it's an internal or external link
- If it's internal we skip, but if it's external we check if the "rel" and "target" attributes exist
- If the "rel" attribute exist we append "noopener" to it, if not we create it
- If the "target" attribute exist we modify it to "_blank", otherwise create it
Enable the widget
Save the file as plugins/check_links.lua and enable it by adding the following to your soupault.toml config file:
[widgets.check-links] widget = "check_links"
If you run the
soupault command the widget will run and you will see any
print() statements. Once the code modifies any HTML you can check your files in the
build/ directory and verify that any external links should now have the correct HTML added to them.
Finding all the anchor tags
After looking through the Plugin API section of the reference manual I found a built-in function that does exactly what I need to solve point one:
Example: links = HTML.select(page, "a")
Returns a list of elements that match selector. The html argument can be either a document or an element node.
The function returns a table which we are able to loop through:
links = HTML.select(page, "a") index = 1 while links[index] do link = links[index] print(link) index = index + 1 end
page is part of the plugin environment and Lua arrays tends to start at 1.
Try and run
soupault, you should see every anchor tag in the HTML code being printed.
Get the "href" attribute
This is quite an easy one. Again, there is a built-in function available to us:
Example: href = HTML.get_attribute(link, "href")
Returns the value of an element attribute. The first argument must be an element reference produced by HTML.select_one or another function.
Modify the code as follows:
@@ -3,6 +3,7 @@ links = HTML.select(page, "a") index = 1 while links[index] do link = links[index] - print(link) + href = HTML.get_attribute(link, "href") + print(href) index = index + 1 end
soupault once more, you should see just the URLs printed this time.
Is the link internal or external?
Let's define what I consider to be an internal and external link. An internal link is a link within my own website, i.e. anything under
https://pwd.re. An external link would be everything else (even subdomains under
pwd.re). So for the purpose of this script I will consider every link starting with
http(s):// to be external and should be checked.
But wait... What happens if I create a link to
https://pwd.re/something in my HTML code? Wouldn't that be considered an external link, and be checked? Yes, it would. I do consider this as a bug but I don't think it's important enough for me to fix. I reason that I control my own content and I will never link within my own site using an absolute URL, only relative.
A possible fix for this would be to add the domain of the site to
soupault.toml, have the script extract the domain from the URL and compare the two. But, moving on.
This code will take the URL from the href attribute and check if it starts with either http:// or https://. If it doesn't we'll skip this URL but if it does we'll continue with our checks.
@@ -4,6 +4,8 @@ index = 1 while links[index] do link = links[index] href = HTML.get_attribute(link, "href") - print(href) + if Regex.match(href, "^http(s?)://") ~= nil then + print(href) + end index = index + 1 end
Check the "rel" attribute
Just as with the "href" attribute we look for a "rel" attribute, and if there is one we append the values to it. If there isn't one we'll just create it and set the values we want.
@@ -5,7 +5,12 @@ while links[index] do link = links[index] href = HTML.get_attribute(link, "href") if Regex.match(href, "^http(s?)://") ~= nil then - print(href) + rel_attrib = HTML.get_attribute(link, "rel") + if rel_attrib ~= nil then + HTML.set_attribute(link, "rel", HTML.get_attribute(link, "rel") .. " noopener") + else + HTML.set_attribute(link, "rel", "noopener") + end end index = index + 1 end
We have another bug here. If the
"rel" attribute already contains
"noopener" it will be added a second time, since we make no checks for what values are already in the attribute. Again, this is not something that is important for me to fix since I'm sure I will not add the value to any links.
The "target" attribute
We'll use the same built-in function as for the
"rel" attribute, but this time we will set it to
"_blank" no matter what.
@@ -12,5 +12,10 @@ while links[index] do HTML.set_attribute(link, "rel", "noopener") end end + + if HTML.get_attribute(link, "target") ~= "_blank" then + HTML.set_attribute(link, "target", "_blank") + end + index = index + 1 end
soupault one last time and make sure the links in your
build/ directory have the proper attributes and values set. If so, we're done!
With this I hope to show how I went on writing a small script, from start to finish. I haven't written a lot of Lua code but together with the Soupault reference I did not have any problems at all getting the code to do what I wanted. I hope to make more scripts available, hopefully together with some kind of writing to show the process behind it but also for myself to remember it.
Any comments? Easiest is to tell me on Mastodon.