Regex confusion: '\S' vs '\\S'

KrashOverryder · August 19, 2023, 4:14am

I’m working my way through the Scientific Computing with Python track taught by Doc Chuck Severance. I just finished watching the “Regular Expressions: Matching and Extracting Data” video, and I’m confused about the question that comes with the video. In the tutorial Doc Severance uses the example:

y = re.findall (‘\S+@\S+’, x)
print(y)

I understood this to mean find "any non-whitespace character ‘\S’ one or more times ‘+’, followed by ‘@’, followed by any non-whitespace character ‘\S’ one or more times ‘+’.

But then in the accompanying question, this example is used:

import re
s = ‘A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM’
lst = re.findall(‘\\S+@\\S+’, s)
print(lst)

What I don’t understand is why there are two backslashes before the two ‘S’ characters. The correct answer for the output when this code is run is:

[‘csev@umich.edu’, ‘cwen@iupui.edu’]

When I run the code myself, whether I leave one backslash or two before the ‘S’ characters, the output is the same. I thought backslash was the Regex escape character, so I expected if there was a backslash before a backslash that would mean to search for a backslash, which would mean that ‘\\S+@\\S+’ would translate as find “a backslash ‘\’, followed by ‘S’ one or more times ‘+’, followed by ‘@’ , followed by a backslash ‘\’, followed by ‘S’ one or more times ‘+’”. But apparently I’m misunderstanding something here. Can someone please explain why code with “\\S” and code with “\S” produce the same output?

EDIT: After I posted this, I noticed that in this post where I had put double-backslashes only a single backslash was displaying after I posted it. To get it to display double-backslashes I had to edit the post and type triple-backslashes where I wanted double-backslashes displayed. What is up with that? And does it have something to do with my initial question?

pkdvalis · August 19, 2023, 5:31pm

Some info about python escape char and regex here: https://docs.python.org/3/library/re.html in the first section.

pkdvalis · August 19, 2023, 7:45pm

Same here: https://forum.freecodecamp.org/t/challenge-lesson-28-of-python-for-everybody-course/587338

pkdvalis · August 19, 2023, 7:46pm

Same question: https://forum.freecodecamp.org/t/scientific-computing-with-python-regular-expressions/616648

pkdvalis · August 19, 2023, 7:49pm

As you’ve found, it’s hard to discuss escape characters on a web forum, because you are using escape characters: https://forum.freecodecamp.org/t/matching-and-extracting-data/535596

KrashOverryder · August 20, 2023, 12:52pm

Thanks! I understand now. Unless you’re using Python’s raw string notation, one backslash is the same as two (that is they both output one backslash). Three is the same as four (outputting two), five is the same as six (outputting three), and so on. So since Regex is a language within a language, it doesn’t matter whether it’s two or one backslash because it’s still intepreted as one by Regex. One of the two backslashes will be considered an escape character by Python.

KrashOverryder · August 20, 2023, 1:30pm

Actually the single backslash only works if certain characters follow it. Oh well! I can’t imagine this will come up very often.

system · February 19, 2024, 1:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scientific Computing with Python: Regular Expressions Python	6	844	December 18, 2023
Matching and Extracting Data Python	11	947	January 27, 2023
Regular Expressions: Matching and Extracting Data Python	1	358	June 1, 2021
Challenge/Lesson #28 of Python for Everybody course Python	3	570	July 26, 2023
Question about Regular Expression From Video 11 B Python	1	298	August 17, 2022

Regex confusion: '\S' vs '\\S'

Related topics