Suggestions

TLDR; Not Your Typical Privacy Agreement

Powered by Cohere

Samsung Galaxy S10e

Specifications

  • Dimensions: 142.2 x 69.9 x 7.9 mm (5.60 x 2.75 x 0.31 in)
  • Weight: 150 g (5.29 oz)
  • Display: Dynamic AMOLED, HDR10+
  • Resolution: 1080 x 2280 pixels, 19:9 ratio (~438 ppi density)
  • OS: Android 9.0 (Pie), upgradable to Android 12, One UI 4.1
  • CPU: Octa-core (2x2.73 GHz Mongoose M4 & 2x2.31 GHz Cortex-A75 & 4x1.95 GHz Cortex-A55) - EMEA/LATAM
  • Main Camera: 12 MP, f/1.5-2.4, 26mm (wide)
  • Selfie Camera: 10 MP, f/1.9, 26mm (wide), 1/3", 1.22µm, dual pixel PDAF
  • Battery: Li-Ion 3100 mAh, non-removable
All Notes

Converting JSON to HTML

Sunday, January 21, 2024
Author:
Share to Reddit
Share to Facebook
Share to X
Share to LinkedIn
Share to WhatsApp
Share by email
Describing the associated blog post


Let's begin by getting the definitions out of the way🥱. JSON stands for JavaScript Object Notation and it is the standard for data exchange over the internet. Most applications that we use every day rely on this format to transport data between different endpoints. HTML on the other hand stands for HyperText Markup Language. It is the standard markup language for creating web pages and web applications. If you right-click on any website and click "view page source", the text that you see is HTML.

In this article, I will show you how to extract text from an HTML webpage and convert it to JSON format, a technique known as web scraping. To do this we will need to ensure the following steps:

  1. The HTML file is accessible from a server.
  2. Install some libraries to access the DOM.
  3. Creating routes
  4. Converting JSON to an object
  5. Writing to a JSON file

JSON data exchange diagram

Spinning Up A Server

One of the best things about JavaScript is that you can use it on the client side as well as the server side👍. The client-side makes use of the Document Object Model (DOM) which is an object that represents a web page. Unfortunately, server-side JavaScript cannot access the DOM😢. For example, the following code would only execute in a client-side JavaScript file.

var content = document.getElementByID("content");
console.log(content.innerHTML);

For us to access the DOM on the server side the same way we would on the client, we need to use the JSDom library. This library can read HTML code from a file and has access to properties from the document object. JSDom to the rescue!🦸

The npm commands only work if the Node runtime is installed on the system. Follow these instructions to install it. Let's set up our project. First, we need to initialize the node package manager by typing the npm init command into the terminal. Just press enter to bypass the prompts in the terminal.

Then, we need to install the following packages, again, by typing into the terminal npm install body-parser express jsdom nodemon. A new package.json file should have been generated after running the first command and should look something like this:

"dependencies": {
    "body-parser": "^1.19.0",
    "express": "^4.18.2",
    "jsdom": "^19.0.0",
    "nodemon": "^2.0.14"
  },
  • The body-parser package allows us to send responses from the browser to the server.
  • The express package allows us to create a server, and define endpoints.
  • The jsdom, AKA superhero library allows us to access the DOM from the server.
  • The nodemon library enables automatic restarts on our server with every source change.

Rapid typing on a keyboard

Creating Express Server

To create the server, we can write the following code.

const express = require("express"); // importing express
const app = express();
const PORT = process.env.PORT || 3000;
const bodyParser = require("body-parser"); // Importing body parser

app.listen(PORT, console.log(`Server running on port ${PORT}`));

app.use(express.static("public"));
app.use(
  bodyParser.urlencoded({
    extended: false,
  })
);

app.get("/", (req, res) => {
  res.sendFile(__dirname + "/public/index.html");
});

Now, if we run the node script.js command from the terminal where 'script' is the name of the JavaScript file, we should see a message in the terminal that says 'Server running on port 3000'. We can automate this process by going into our package.json file and adding a new value under the scripts property as shown below.

"scripts": {
    "dev": "nodemon script.js"
  },

Now, if we type npm run dev in our terminal, the server will automatically restart whenever we make changes to our JavaScript file. Typing Ctrl + C in the terminal will terminate this.

Creating Routes

Now that our express server is up and running, we can configure some routes for our application to navigate to. First, we need an HTML file to serve to the browser🧑‍🍳. Write the code below in the script.js file.

app.use(express.static("public"));
app.use(
  bodyParser.urlencoded({
    extended: false,
  })
);

app.get("/", (req, res) => {
  res.sendFile(__dirname + "/public/index.html");
});

It is a good practice to have a folder named public in case we need to add other things such as images and CSS files. Create a folder named 'public' and inside it, create an index.html file. The HTML file can have anything inside it but mine looks like this:

    <!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <ul>
        <li>Lorem, ipsum.</li>
        <li>Dolor sit amet.</li>
        <li>Consectetur adipisicing elit.</li>
        <li>Voluptates, ipsum!</li>
        <li>Ipsum rror rem nulla.</li>
        <li>Veniam quaerat, est hic.</li>
        <li>Quibusdam</li>
    </ul>
</body>
</html>

If we go to the browser at the web address 'http://localhost:3000/', we will see our HTML displayed. Back in the JavaScript file, we can now create a new variable to access the JSDOM object.

Man pressing a keyboard

Scraping HTML Elements

To access the HTML code from the server, we need to make a POST request using a form and then access the input from the backend. Add a form above the <ul> tag as shown below.

<form action="/" method="POST">
    <textarea name="inputText" id="inputText" cols="30" rows="10" style="display: none;"></textarea>
    <button type="submit">Extract</button>
</form>

Notice how we added a style attribute of display: none on the form element. This is to ensure that the form is invisible in the browser. Next, let's add some client-side JavaScript below the closing </ul> tag as shown below.

<script>
    var inputText = document.getElementById("inputText");
    var textInfo = document.querySelector("ul");
    inputText.innerHTML = textInfo.innerHTML;
</script>

This script simply accesses the contents of the list (ul) and makes the value of the textarea element equal to that list. Add the following statements at the top of the server-side JavaScript file below where we imported the express library.

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

Now, let's create an endpoint for the post request in the server-side JavaScript file as shown below.

app.post("/", (req, res) => {
  const html = req.body.inputText;
  try {
    const dom = new JSDOM(html); // Creating a new JSDom object
    const list = dom.window.document.querySelectorAll("li"); // Accessing the list element from the HTML page
    for (let i = 0; i < list.length; i++) {
        console.log(list[i].textContent); // Displaying each list item to the console
    }
    res.status(200).send("Complete!"); // Success message
  } catch (e) {
    console.log(e); // Error message
  }
});

With REST APIs, 'endpoints' and 'routes' are interchangeable terms🤓.

If we go to our browser at 'http://localhost:3000/', there should be a button that says 'Extract'. Clicking it will display all the list items in the console of the server and there should be a message in the browser that says 'Complete!'.

Man typing and drinking coffee

Writing to JSON

Now that all our endpoints are working, we need a JSON file to insert data into. Let's create a file in the project folder and name it 'data.json'. Write the following inside the JSON file.

{
    "info":[

    ]
}

To read JSON content in our JavaScript file, we need to import a library named fs and then use a method to parse the JSON file. Write the code statements below just above where we imported the jsdom library.

const fs = require("fs"); // importing the fs module
const data = fs.readFileSync("data.json"); // reading the JSON file
const jsonData = JSON.parse(data);

Now, let's rewrite the code for the post request on our server by simply replacing the console.log statement inside the for loop with the following code.

jsonData.info.push({
    id: i,
    item: list[i].textContent,
});

Then, we need to write the data from the object to the JSON file by converting it. Write the following just below the for loop block and above the res.status(200).send("Complete!"); statement.

fs.writeFileSync("data.json", JSON.stringify(jsonData));

Assuming the server had not been stopped and nodemon was still running, navigate back to 'http://localhost:3000/' and click the button that says 'Extract'. The browser will respond with a message that says 'Complete!' and our data.json file has now been populated with data from our HTML👏.

A funny mane typing on a keyboard

Conclusion

Again, thanks to the jsdom library🙏, working with the browser DOM API is possible. One thing to note is that any HTML can be accessed by the JSDom object. I hope you enjoyed this article and learned something new. Reach out to me via the contact section of my website. Until the next article, happy coding😄!

Tawanda Andrew Msengezi

Tawanda Andrew Msengezi is a Software Engineer and Technical Writer who writes all the articles on this blog. He has a Bachelor of Science in Computer Information Systems from Near East University. He is an expert in all things web development with a specific focus on frontend development. This blog contains articles about HTML, CSS, JavaScript and various other tech related content.

User Notice

Dear Visitor,

This website stores your color theme preference, you can toggle your theme preference using the lightbulb icon in the top right of the webpage.

Clicking on the robot icon that says "Chat" in the bottom-left corner will open a chat with an AI assistant. Click the button below to close this message.