Posts

Showing posts with the label honorph

How Do You Validate and Sanitize External JSON Data in Jekyll Using Ruby

Why Should You Validate External Data in Jekyll?

When fetching data from external APIs or remote JSON feeds, you can’t always trust the structure or content:

  • The API may change field names
  • The response might be nil or empty
  • Unexpected values may break Liquid templates

If not handled properly, this can cause your Jekyll build to crash, silently skip data, or render broken pages.

When Should You Validate?

  • Immediately after fetching and parsing external JSON
  • Before writing data to _data/
  • During Liquid rendering — to handle missing keys safely

Strategy 1: Validate with Ruby Before Saving to _data

Let’s say you fetch GitHub issues and expect each entry to have: - `number` (Integer) - `title` (String) - `html_url` (String) Here’s a Ruby plugin that checks the schema:
# _plugins/validate_github_data.rb
require "net/http"
require "json"
require "fileutils"

def valid_issue?(obj)
  obj["number"].is_a?(Integer) &&
  obj["title"].is_a?(String) &&
  obj["html_url"].is_a?(String)
end

Jekyll::Hooks.register :site, :after_init do |_site|
  uri = URI("https://api.github.com/repos/jekyll/jekyll/issues?per_page=10")
  raw = Net::HTTP.get(uri)
  parsed = JSON.parse(raw)

  valid_data = parsed.select { |i| valid_issue?(i) }

  FileUtils.mkdir_p("_data")
  File.write("_data/github_issues.json", JSON.pretty_generate(valid_data))
end

Strategy 2: Sanitize and Default Missing Values

Let’s say some issues are missing `title` or `url`. You can insert defaults:
def sanitize_issue(issue)
  {
    "number"   => issue["number"] || 0,
    "title"    => issue["title"] || "Untitled Issue",
    "html_url" => issue["html_url"] || "#"
  }
end

cleaned_data = parsed.map { |i| sanitize_issue(i) }

Strategy 3: Detect Invalid Entries and Log Them

parsed.each do |i|
  unless valid_issue?(i)
    puts "[WARN] Invalid issue skipped: #{i.inspect[0..100]}"
  end
end

This logs malformed data in the build console without halting execution.

Strategy 4: Validate in Liquid Templates with Safe Defaults

Even with validated data, it’s smart to protect templates:
<ul>
{% for item in site.data.github_issues %}
  <li>
    <a href="{{ item.html_url | default: '#' }}">
      {{ item.title | default: "No Title" }}
    </a>
  </li>
{% endfor %}
</ul>

Can You Use JSON Schema in Jekyll?

Yes, you can use a Ruby gem like json-schema to validate structure more formally. Add it to your Gemfile:

gem "json-schema"
Then:
require "json-schema"

schema = {
  "type" => "object",
  "required" => ["number", "title", "html_url"],
  "properties" => {
    "number" => { "type" => "integer" },
    "title" => { "type" => "string" },
    "html_url" => { "type" => "string", "format" => "uri" }
  }
}

parsed.each do |obj|
  if JSON::Validator.validate(schema, obj)
    valid_data << obj
  else
    puts "Invalid: #{obj["title"]}"
  end
end

Should You Abort the Build on Invalid Data?

That depends:

  • Public-facing website? Better to fail fast to avoid bad UX
  • Internal tool or report? Skip and warn is acceptable
Abort the build manually with:
raise "Build failed due to missing API data" if valid_data.empty?

Bonus: Check Network Status Before Fetching

begin
  response = Net::HTTP.get_response(uri)
  raise "API error: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
  parsed = JSON.parse(response.body)
rescue => e
  puts "[ERROR] Failed to fetch API data: #{e.message}"
  parsed = []
end

Conclusion

External data is powerful — but fragile. Always assume the worst:

  • Data may be missing, invalid, or broken
  • APIs may fail or be rate-limited
  • Jekyll build may break if you don’t check

With Ruby schema validation and safe Liquid fallbacks, your Jekyll site can stay reliable and predictable — even when your data isn't.

Next Steps

  • Add validations to every external data source
  • Wrap your Liquid templates in defaults and conditionals
  • Use GitHub Actions or Netlify logs to monitor for build errors

In the next article, we’ll explore **multi-language content using Jekyll, data files, and custom Ruby plugins** — perfect for building multilingual sites without complex CMS setups.