In this article I will demonstrate how to build a multi-threaded web crawler which can process https web pages.
So how the crawler will work.
So let's get into it by importing some libraries we will be needing in order to build this crawler.
Alright before proceeding…
There are two types of data we use for Analytics.
If you are new to python iterators and generators please refer these articles otherwise you are good to go.
To start let's talk about what is lazy evaluation.
Lazy evaluation is an evaluation strategy which holds the evaluation of an expression until its value is needed i.e it avoids repeated evaluation.
Let's compare Strict Evaluation vs Lazy Evaluation .
Problem: Given a list and positive integer n write a function that splits the list into n groups
This is part 3 of my series in which, I will discuss some applications of iterators and generators.
We use next() method in this function to manually consume a iterator
with open("/etc/something/something.some_extension") as f:
line = next(f)
# file object is an iterable
2. Consider there is a custom container object which internally holds a iterable. And you want to make this container iterable.
Note: when we use a for loop the __iter__() method of the iterable is invoked. …
First lets understand what a temp or temporary file is.
“.tmp”but they are program dependent(i.e different programs create different temp files).
Most common examples of temp files are
This is a continuation of part 1
Definition: A memory-mapped file object maps a normal file object into a memory. This allows us to modify a file object’s content directly in memory.
file objects. Hence all the operations which can be performed on a
bytearraylike indexing,slicing assigning a slice, or using
remodule to search through the file.
seek()to position the current pointer to different position.
The memory mapped file object is…
What are partial functions. understanding functools.partial and its applications and use-cases.
functools.partial(func,*args,**kwargs) returns a new partial object when called will behave like func called with positional arguments(*args) and keywords arguments (*kwargs)
The generator functions are one-way communication i.e we can retrieve information from generator using
next() ,but we cannot interact with it or affect its execution while running.
First let’s understand generator.send()
It is used to send value to a generator that just yielded.
x = yield
yield x * 2gen = double_inputs()
next(gen) # run upto the first yield
print(gen.send(10)) # goes into x variable -->20next(gen) # run upto the next yield
print(gen.send(6)) --> 12next(gen) # runs upto next yield
print(gen.send(45)) # foes into x again -->90next(gen) # runs upto the…
In this case the data won't be kept in the memory(RAM) after it’s written to the file
with open("test.bin","wb") as f:
In this case instead of writing contents to a file, it is written…
Data Scientist, Pythonista, Algorithms lover