Matching and Extracting Data

RiddlesInTheVineyard · July 28, 2022, 12:49pm

Hello, thank you for taking the time!
Here in the curriculum I currently am:
Can’t figure out why there are double backslashes in the excersice:
The excercise test string :

'A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'

The excercise Regex :

\\S+@\\S+

Thought it is a mistake and will search for literal backslash rather than non-spaces.
Checked it out in python aaaaand… it’s worked:

so in order to investigate the phenomenon I opened regex101 site, aaaaand…

it was actually agreeing with my first assumptions, and didn’t find any matches.
Can anyone shed light why it doesn’t work in theory but will work in practice?

RiddlesInTheVineyard · July 28, 2022, 7:10pm

Thank you! and nice trick to know- that ‘r’…
But notice that the regex actually matches “csev@umich.edu” and “cwen@iupui.edu”! (see terminal prtsc)
As though “\\S” (computer-lang) is interpreted as “non-space-character”(humans-lang), and not like “back-slash followed by capital s” as you told it should.
The question in the challenge seem to predict this behaviour, unlike what I expected after watching the video lecture, or by using “regex101” tool.

RiddlesInTheVineyard · July 28, 2022, 7:23pm

Sorry, I edited that part of my response . it’s a double back-slash. now you got my question?

RiddlesInTheVineyard · July 28, 2022, 7:28pm

If I got you right, the regex

\\S+@\\S+

matches the string

"\S\S@\S"

and not the string:

"csev@umich.edu"

It’s turn-out not to be the case, and that is what my question is about,

RiddlesInTheVineyard · July 28, 2022, 7:34pm

That is my point! look above, in the black print-screen, what happend when I run python itself in terminal- it matches “csev@umich.edu”! and that also the right answer in the challenge!

RiddlesInTheVineyard · July 28, 2022, 8:31pm

Feel free to abandon me if I’m too tiresome …
Anyway-
python terminal will match the said email adressess wether I use

'\\S+@\\S+'

or

'\S+@\S+'

and that make no sense to me.

Now, you said that there are 2 ways to make backslashes to be treated literarely.

prefixing ‘r’
writind double backslashes instead of single.

That to assume that I want to make them be treated literarely.
But my goal is to match the fictional email adressess which contain no backslashes, in order to do that I want them to be treated unliterarely which means

no prefixing ‘r’
no double ‘backslashes’

So why will it match even with the use of double-backslashes?!
And why are you saying that

If you do not include the r in front, then to accomplish the same thing, you must write it as:
'\\S+@\\S+'
to match the strings “csev@umich.edu” and “cwen@iupui.edu”

But according to your first reply

The reason you see the extra \ is you need to escape the \ to have it interpret it as a literal \ , so that together with the S after it, it is like \S as you would normally write.

It will make backslashes to be treated literarely, and hence, to not match the said email adressess, which is opposite to my goal?

RiddlesInTheVineyard · July 28, 2022, 8:45pm

I said it because that is what happening not because that is what you said

RiddlesInTheVineyard · July 28, 2022, 8:47pm

And I dont get you, why

re.findall('\\S+@\\S+') or re.findall(r'\S+@\S+')

and not the opposite

re.findall('\S+@\S+')
if want to match non-spaces rather then backslashes?

RiddlesInTheVineyard · July 28, 2022, 8:51pm

So why single backslash, no ‘r’ will work too?

RiddlesInTheVineyard · July 28, 2022, 8:53pm

Here, I use both single and double backslashes regexes,
both of them will match the said emails.

RiddlesInTheVineyard · July 28, 2022, 9:26pm

Well, regexes can really be a one of a headache!
The probalem is probably that the general python interpreter treat the escapes before the re libarary can see the regex or something. played with that a little bit, got just more confussed:

Here I included several backslashes in a string.

for regex of four(!) backslashes it will find two backslashes per one real backslash in the string,
for regex of three, Python will send error " unterminated string literal "
for regex of two (the ‘normal’ one) re libarary will send very long trace-back and finally " bad escape " error.

I really appreciate you patience and help ( already beforehand admires your activity in the forum ), best regards!

system · January 27, 2023, 9:26am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scientific Computing with Python: Regular Expressions Python	6	828	December 18, 2023
Regex confusion: '\S' vs '\\S' Python	7	1139	February 19, 2024
Challenge/Lesson #28 of Python for Everybody course Python	3	552	July 26, 2023
Regular Expressions: Matching and Extracting Data Python	1	353	June 1, 2021
A bug in the question of Python for Everybody Python	3	518	June 1, 2021

Matching and Extracting Data

Related topics