How can I remove punctuation from input text in Java?

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");

Spaces are initially left in the input so the split will still work.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

Leave a Comment