python re.split() to split by spaces, commas, and periods, but not in cases like 1,000 or 1.50

Use a negative lookahead and a negative lookbehind:

> s = "one two 3.4 5,6 seven.eight nine,ten"
> parts = re.split('\s|(?<!\d)[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten']

In other words, you always split by \s (whitespace), and only split by commas and periods if they are not followed (?!\d) or preceded (?<!\d) by a digit.

DEMO.

EDIT: As per @verdesmarald comment, you may want to use the following instead:

> s = "one two 3.4 5,6 seven.eight nine,ten,1.2,a,5"
> print re.split('\s|(?<!\d)[,.]|[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten', '1.2', 'a', '5']

This will split "1.2,a,5" into ["1.2", "a", "5"].

DEMO.

Leave a Comment