Stylette

Styling the Web with Natural Language



Abstract

End-users can potentially style and customize websites by editing them through in-browser developer tools. Unfortunately, end-users lack the knowledge needed to translate high-level styling goals into low-level code edits. We present Stylette, a browser extension that enables users to change the style of websites by expressing goals in natural language. By interpreting the user’s goal with a large language model and extracting suggestions from our dataset of 1.7 million web components, Stylette generates a palette of CSS properties and values that the user can apply to reach their goal. A comparative study (N=40) showed that Stylette lowered the learning curve, helping participants perform styling changes 35% faster than those using developer tools. By presenting various alternatives for a single goal, the tool helped participants familiarize themselves with CSS through experimentation. Beyond CSS, our work can be expanded to help novices quickly grasp complex software or programming languages.


System

Stylette enables end-users to change the visual design of any website by simply clicking on a component and saying what change they want to see. The system interprets the user’s natural language request (b) and clicked component (a) to present a palette (c) that consists of multiple CSS properties that could be changed to satisfy the request and various suggestions extracted from a large-scale dataset. Stylette is implemented as a Chrome Extension.

Figure shows that the user has selected the a subtitle in the CHI 2022 website and, underneath the subtitle, it shows hovering Stylette. Stylette shows a the transcription box with the text "tone down the text" and the style palette underneath the transcription box. The style palette is described further in Figure 3.

Palette

The style palette shows different properties as columns and different value suggestions for each property as rows within each column.

For each property, the palettes presents:

(a) The current value.
(b) The default or original value before any changes.
(c) A list of suggested values.
(d) For numerical values, suggested values that are either larger or smaller than the current value based on the system’s prediction.
(e) Arrows next to a suggested value to see other similar suggestions.
(f) If the user clicks on the “+” button, more suggestions are shown.
(g) If the user clicks on the current value, widgets are revealed for the user to manually set values.


Pipeline

We present a computational pipeline that processes the two input modalities, voice and click, to generate the palettes.

Diagram of the computational pipeline shows on the left a user making a voice request and clicking on a web component. On the top, it shows how the voice request is processed by transcribing with Google Cloud Speech-to-Text, concatenated with pseudo-tokens, inputted into GPT-Neo with Trained P-Tuning to finally generate and extract predicted CSS properties and change direction. On the bottom, it show how the component clicked by the user is captured and then inputted to a Variational Autoencoder model to extract an embedding vector. This embedding vector is then compared with cosine similarity to other embeddings in the Web Component dataset to identify the property values of similar components from which values are grouped and sampled to provide suggestions to the user.

NLP Pipeline

Architecture
  1. Transcribe request using an STT API.
  2. Concatenate with pseudo-tokens.
  3. Input to GPT-Neo model to generate CSS properties.
  4. Concatenate remaining pseudo-tokens.
  5. Input to GPT-Model to generate change direction.
  6. Extract direction and CSS properties.
Dataset
CV Pipeline

Architecture
  1. Screenshot the clicked component.
  2. Input to trained VAE model to encode the screenshot as a 512-dimensional vector.
  3. Compare vector to the vector representations of all components in dataset.
  4. Extract CSS property values of most similar components in dataset.
  5. Group values and sample values from the groups.
Dataset

Results

Through a between-subjects study (N=40), we compared Stylette to Chrome DevTools through two tasks: (1) a well-defined task and (2) an open-ended task.

Bar chart on the left shows that the self-confidence of Stylette participants initially increased, but decreased later during the study. Bar chart on the right shows that self-confidence of DevTools participants increased initially and also slightly increased later in the study. Specific self-confidence ratings are included in the text.

Self-confidence increased significantly after Task 1 for both conditions. But, self-confidence decreased significantly only for Stylette participants after Task 2.

The study revealed that Stylette’s interpretation of users’ vague requests could help them quickly learn how to use CSS. Once users had acquired the knowledge, however, ML-related flaws could significantly limit their interactions.


CHI 2022 Paper    Honorable Mention Award 

Camera-ready PDF

ACM DL Link

Bibtex

@inproceedings{10.1145/3491102.3501931,
  author = {Kim, Tae Soo and Choi, DaEun and Choi, Yoonseo and Kim, Juho},
  title = {Stylette: Styling the Web with Natural Language},
  year = {2022},
  isbn = {9781450391573},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3491102.3501931},
  doi = {10.1145/3491102.3501931},
  booktitle = {CHI Conference on Human Factors in Computing Systems},
  articleno = {5},
  numpages = {17},
  keywords = {End-User Programming, Web Design, Machine Learning, Natural Language Interface},
  location = {New Orleans, LA, USA},
  series = {CHI '22}
}

Logo of KIXLAB Logo of KAIST

This work was supported by IITP grant funded by the Korea government (MSIT) and the KAIST-NAVER Hypercreative AI Center.