Description
Thanks for helping us improve this project!
Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml2/blob/master/INSTALL.md#update-this-project-and-its-libraries
Also please make sure to read the FAQ (https://github.com/ageron/handson-ml2#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml2/issues
Describe the bug
The issue is in chapter 3 classification last exercise email spam or ham classification In Cell 144. The code through loop runs for every part of email but when it first arrives of text/plain or text/html once it returns them not further look for another text/plain or text/html as email can be multipart and various emails have text/plain and then text/html and further.
To Reproduce
def email_to_text(email):
html = None
for part in email.walk():
ctype = part.get_content_type()
if not ctype in ("text/plain", "text/html"):
continue
try:
content = part.get_content()
except: # in case of encoding issues
content = str(part.get_payload())
if ctype == "text/plain":
return content
else:
html = content
if html:
return html_to_plain_text(html)
And if you got an exception, please copy the full stacktrace here:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in inverse
ZeroDivisionError: division by zero
Expected behavior
i expected the code to run for every part of email and once it finds a text/plain or text/html it stores it in a variable and concatenates it with another text/plain or text/html found in email and after loop had completely run it returns that variable.
Screenshots
If applicable, add screenshots to help explain your problem.
Versions (please complete the following information):
- OS: [e.g. MacOSX 10.15.7]
- Python: [e.g. 3.7]
- TensorFlow: [e.g., 2.4.1]
- Scikit-Learn: [e.g., 0.24.1]
- Other libraries that may be connected with the issue: [e.g., gym 0.18.0]
Additional context
This is proper code.
def email_to_text(email):
total_content = ""
for part in email.walk():
ctype = part.get_content_type()
if not ctype in ("text/plain", "text/html"):
continue
else:
try:
content = part.get_content()
except:
content = str(part.get_payload())
if ctype == "text/plain":
total_content += content
else:
total_content += html_to_text(content)
return total_content