Remove empty HTML text nodes

This code removes HTML text nodes that have no non-whitespace content.

foreach (var node in document.DocumentNode
    .DescendantsAndSelf()
    .Where(n => n.NodeType == HtmlNodeType.Text && 
        string.IsNullOrWhiteSpace(n.InnerText)).ToList())
{
    node.Remove();
}

DescendantsAndSelf() will include the root node in the search, which may be necessary depending on the requirements.
ToList() to create a separate list for removal, avoiding issues with modifying the collection while iterating

Comments