Matching and Extracting Data

Hello, thank you for taking the time!:heart:
Here in the curriculum I currently am:
Can’t figure out why there are double backslashes in the excersice:
The excercise test string :

'A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'

The excercise Regex :

\\S+@\\S+

Thought it is a mistake :scream: and will search for literal backslash rather than non-spaces.
Checked it out in python aaaaand… it’s worked:
image
so in order to investigate the phenomenon I opened regex101 site, aaaaand…


it was actually agreeing with my first assumptions, and didn’t find any matches.
Can anyone shed light why it doesn’t work in theory but will work in practice?

Thank you! and nice trick to know- that ‘r’…
But notice that the regex actually matches “csev@umich.edu” and “cwen@iupui.edu”! (see terminal prtsc)
As though “\\S” (computer-lang) is interpreted as “non-space-character”(humans-lang), and not like “back-slash followed by capital s” as you told it should.
The question in the challenge seem to predict this behaviour, unlike what I expected after watching the video lecture, or by using “regex101” tool.

Sorry, I edited that part of my response . it’s a double back-slash. now you got my question?

If I got you right, the regex

\\S+@\\S+

matches the string

"\S\S@\S"

and not the string:

"csev@umich.edu"

It’s turn-out not to be the case, and that is what my question is about, :thinking:

That is my point! look above, in the black print-screen, what happend when I run python itself in terminal- it matches “csev@umich.edu”! and that also the right answer in the challenge!

Feel free to abandon me if I’m too tiresome :woozy_face:
Anyway-
python terminal will match the said email adressess wether I use

'\\S+@\\S+'

or

'\S+@\S+'

and that make no sense to me.

Now, you said that there are 2 ways to make backslashes to be treated literarely.

  1. prefixing ‘r’
  2. writind double backslashes instead of single.

That to assume that I want to make them be treated literarely.
But my goal is to match the fictional email adressess which contain no backslashes, in order to do that I want them to be treated unliterarely which means

  1. no prefixing ‘r’
  2. no double ‘backslashes’

So why will it match even with the use of double-backslashes?!
And why are you saying that

If you do not include the r in front, then to accomplish the same thing, you must write it as:

'\\S+@\\S+'

to match the strings “csev@umich.edu” and “cwen@iupui.edu”

But according to your first reply

The reason you see the extra \ is you need to escape the \ to have it interpret it as a literal \ , so that together with the S after it, it is like \S as you would normally write.

It will make backslashes to be treated literarely, and hence, to not match the said email adressess, which is opposite to my goal?

I said it because that is what happening not because that is what you said :smirk:

And I dont get you, why

re.findall('\\S+@\\S+') or re.findall(r'\S+@\S+')

and not the opposite

re.findall('\S+@\S+')
if want to match non-spaces rather then backslashes?

So why single backslash, no ‘r’ will work too?


Here, I use both single and double backslashes regexes,
both of them will match the said emails.

Well, regexes can really be a one of a headache!
The probalem is probably that the general python interpreter treat the escapes before the re libarary can see the regex or something. played with that a little bit, got just more confussed:


Here I included several backslashes in a string.

  1. for regex of four(!) backslashes it will find two backslashes per one real backslash in the string,
  2. for regex of three, Python will send error " unterminated string literal "
  3. for regex of two (the ‘normal’ one) re libarary will send very long trace-back and finally " bad escape " error.

I really appreciate you patience and help ( already beforehand admires your activity in the forum ), best regards! :smiling_face_with_three_hearts:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.