Regex confusion: '\S' vs '\\S'

I’m working my way through the Scientific Computing with Python track taught by Doc Chuck Severance. I just finished watching the “Regular Expressions: Matching and Extracting Data” video, and I’m confused about the question that comes with the video. In the tutorial Doc Severance uses the example:

y = re.findall (‘\S+@\S+’, x)

I understood this to mean find "any non-whitespace character ‘\S’ one or more times ‘+’, followed by ‘@’, followed by any non-whitespace character ‘\S’ one or more times ‘+’.

But then in the accompanying question, this example is used:

import re
s = ‘A message from to about meeting @2PM
lst = re.findall(‘\\S+@\\S+’, s)

What I don’t understand is why there are two backslashes before the two ‘S’ characters. The correct answer for the output when this code is run is:

[‘’, ‘’]

When I run the code myself, whether I leave one backslash or two before the ‘S’ characters, the output is the same. I thought backslash was the Regex escape character, so I expected if there was a backslash before a backslash that would mean to search for a backslash, which would mean that ‘\\S+@\\S+’ would translate as find “a backslash ‘\’, followed by ‘S’ one or more times ‘+’, followed by ‘@’ , followed by a backslash ‘\’, followed by ‘S’ one or more times ‘+’”. But apparently I’m misunderstanding something here. Can someone please explain why code with “\\S” and code with “\S” produce the same output?

EDIT: After I posted this, I noticed that in this post where I had put double-backslashes only a single backslash was displaying after I posted it. To get it to display double-backslashes I had to edit the post and type triple-backslashes where I wanted double-backslashes displayed. What is up with that? And does it have something to do with my initial question?

Some info about python escape char and regex here: in the first section.

Same here:

Same question:

As you’ve found, it’s hard to discuss escape characters on a web forum, because you are using escape characters:

Thanks! I understand now. Unless you’re using Python’s raw string notation, one backslash is the same as two (that is they both output one backslash). Three is the same as four (outputting two), five is the same as six (outputting three), and so on. So since Regex is a language within a language, it doesn’t matter whether it’s two or one backslash because it’s still intepreted as one by Regex. One of the two backslashes will be considered an escape character by Python.

Actually the single backslash only works if certain characters follow it. Oh well! I can’t imagine this will come up very often.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.