Pull data from another table if contains

Hello - I have 2 csv files each containing a data . I would like python to import the tables, then if any value from file 1 is found in file2, it will add that value to file2 in a new column. When completed, it will save the new file as csv to file3.

file1

NAME COLUMN
word1
word2
word3
word4
word5
word6
word7
word8
word9
word10
word11
word12
word13
word14

file2

SN	       NAME
1083	word1 and other text
1099	text and word8
1146	text with word1
1171	word2 and more
1329	some other words and word 14
1446	word1 here
1662	blah word2
1777	here is word 7 only
1887	word8 is here
1911	blah blah word9 blah blah
1939	word10 blah
1971	word11 blah blah

file3

SN	        NAME	found
1083	word1 and other text	word1
1099	text and word8	               word8
1146	text with word1	               word1
1171	word2 and more	               word2
1329	some other words and word 14	      word14
1446	word1 here	                       word1
1662	blah word2	                      word2
1777	here is word 7 only	     word7
1887	word8 is here	                       word8
1911	blah blah word9 blah blah	word9
1939	word10 blah	                       word10
1971	word11 blah blah	                 word11

What have you tried so far?

Here’s what I have: the example above was shortened. But, here is my actual code, below. It is not writing to a file. And, it is looking in column 3 of the file. The data from file1 needs to be all capitals for mine to work, also. It also doesn’t add a column like i want. It just prints the line and I copy all results when completed.

brand = []
with open (r'C:\python_files\ez1.csv', 'r') as f:
    # skip first line
    f.readline()
    for line in f:
        brand.append(line.strip("\n"))

with open (r'C:\python_files\ez2.csv', 'r') as f:
        # skip the first line
        f.readline()
        for line in f:
            if (line.split(",")[2]) in brand:
                print (line),

When comparing strings of text and you don’t have guaranteed consistency of uppercase or lowercase, you can use either one in your comparison.

a = "tExT"
b = "TeXt"
a.lower() == b.lower()  # true

Currently the intention of your code will be to check if a sentence is in a list of words.

I.e. is the sentence "I like cheese" in the list of words ["cheese", "pizza", "tacos"]?

It won’t work:

words = ["some", "is"]
a = "sOme Sentence tHat iS Long"
a.lower() in words     # false

# checking that a sentence contains a word
b = "thAt"
b.lower() in a.lower()   # true

# so, we can iterate over the words of a sentence, and check if each word from the sentence is in the list of words
for word in a.split():
  print(word, word.lower() in words)
  # sOme, True
  # iS, True

Regarding CSV, I suggest looking into Python’s CSV library and using that to both read and write CSV.


Indexes of lists start at 0, not 1:

colors = ["blue", "red", "yellow", "green"]
# index    0       1      2         3
colors[0]   # blue, the first
colors[1]   # red, the second
colors[2]   # yellow, the third
colors[3]   # green, the fourth

So, what I suggest is that you need to build a list and then write that to your new CSV file. Here’s a basic example of how that works. You will need to adapt this to your needs.

import csv

# first build my results. This will be a list of dictionaries.
# the dictionary keys correspond to the CSV column names
results = []
results.append({"one": "a", "two": "b"})
results.append({"one": "d", "two": "e"})

with open("results.csv", "w", newline="") as csvfile:
    fieldnames = ["one", "two"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()  # write the header line
    writer.writerows(results)  # write all of the results to the file

Cheers

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.