Skip to content

[BUG] in chapter 3 classification last exercise email classification #650

Open
@faisalhussain-devs

Description

@faisalhussain-devs

Thanks for helping us improve this project!

Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml2/blob/master/INSTALL.md#update-this-project-and-its-libraries

Also please make sure to read the FAQ (https://github.com/ageron/handson-ml2#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml2/issues

Describe the bug
The issue is in chapter 3 classification last exercise email spam or ham classification In Cell 144. The code through loop runs for every part of email but when it first arrives of text/plain or text/html once it returns them not further look for another text/plain or text/html as email can be multipart and various emails have text/plain and then text/html and further.

To Reproduce
def email_to_text(email):
html = None
for part in email.walk():
ctype = part.get_content_type()
if not ctype in ("text/plain", "text/html"):
continue
try:
content = part.get_content()
except: # in case of encoding issues
content = str(part.get_payload())
if ctype == "text/plain":
return content
else:
html = content
if html:
return html_to_plain_text(html)

And if you got an exception, please copy the full stacktrace here:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in inverse
ZeroDivisionError: division by zero

Expected behavior
i expected the code to run for every part of email and once it finds a text/plain or text/html it stores it in a variable and concatenates it with another text/plain or text/html found in email and after loop had completely run it returns that variable.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions (please complete the following information):

  • OS: [e.g. MacOSX 10.15.7]
  • Python: [e.g. 3.7]
  • TensorFlow: [e.g., 2.4.1]
  • Scikit-Learn: [e.g., 0.24.1]
  • Other libraries that may be connected with the issue: [e.g., gym 0.18.0]

Additional context
This is proper code.

def email_to_text(email):
total_content = ""
for part in email.walk():
ctype = part.get_content_type()
if not ctype in ("text/plain", "text/html"):
continue
else:
try:
content = part.get_content()
except:
content = str(part.get_payload())
if ctype == "text/plain":
total_content += content
else:
total_content += html_to_text(content)
return total_content

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions