Typefaces - Character Font Image & CSV Dataset

A dataset for Machine Learning experiment on Typefaces.

A dataset for Machine Learning experiment on Typefaces.


This is a dataset that consists of 2500+ fonts taken from the Google Fonts database that have been converted to character-wise PNGs and then flattened into a CSV for ML experiments.

I have only considered uppercase / lower case letters and numbers - ignoring other symbols or dingbats.

Each PNG is in greyscale (each pixel can take the value of 0 to 255)) 28x28 pixels in size.

screenshot fonts

The CSV is formatted similar to MNIST in CSV:

font-name, character, pix-11, pix-12, pix-13, ...

screenshot csv

Bugs? Built something cool?

Reach out @mohammedri_

Script to generate this dataset from .ttf files:

image_width = 28
image_height = 28

characters = list(string.ascii_letters + string.digits) # Characters to consider, ignoring symbols

with open(csv_dataset_folder + "/" + "google_fonts_dataset.csv", 'w', newline='') as file:
    writer = csv.writer(file, dialect="excel")
    for font_file in ttf_files:

        font_name = font_file.split("/")[-1]

        font = ImageFont.truetype(font_file, 20, encoding="unic")

        for character in characters:    
            image = Image.new('L', (image_width,image_height), "black")
            draw = ImageDraw.Draw(image)
            text_width, text_height = draw.textsize(character, font=font)

            offset = font.getoffset(character)


            row = list(image.getdata())
            row.insert(0, character)
            row.insert(0, font_name)
            if character.isupper():
                image.save(dataset_dir + "/" + images_dir + f"/{font_name}-upper-{character}.png", "PNG")
                image.save(dataset_dir + "/" + images_dir + f"/{font_name}-{character}.png", "PNG")