dev notes

Stripe Missing required param: items

You may see this error when calling stripe.Subscription.create():

Missing required param: items

It’s a misleading message because it implies that the entire items param is missing, but you’ve probably set one of the values for price to null, empty, etc.  

Determining if App is Running in AWS AppRunner

 

This does not seem to be documented anywhere.  If you need to tell, from within your AWS AppRunner application, whether it’s running locally or in AWS you can check the environment variable AWS_EXECUTION_ENV.   The value for the environment variable will be set to AWS_ECS_FARGATE if it’s running on AppRunner.

Here’s example:

aws_environment = os.environ.get('AWS_EXECUTION_ENV', None)

if aws_environment:
logger.info("Running on AWS: {}".format(aws_environment))
else:
logger.info("Running locally!")

AWS AppRunner Pending certificate DNS validation

I had an AppRunner instance with a custom domain that was stuck on "Pending certificate DNS validation” as it was trying to validate the ACM certificates.  

Turns out that the UI to copy and paste the DNS validation entries can lead you astray.  

The record names include the domain name already.  So if you copy and paste from the AppRunner console into Route 53 you will end up with the domain name in the record twice.  

Simply remove the domain before pasting.  I chalk this up to a less-than-good user interface on the AppRunner side.

torch.cuda.is_available() Returns False

torch.cuda.is_available() returning False is one of the most frustrating errors to encounter, particularly when Docker is being used.  But I would suggest taking a procedural approach to ridding yourself of this error.

1.  Get the "CUDA Version" installed by running nvidia-smi.  You should see something like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off |
| 53% 73C P2 374W / 450W | 21160MiB / 24564MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1338 G /usr/lib/xorg/Xorg 107MiB |
| 0 N/A N/A 1706 G /usr/bin/gnome-shell 44MiB |
| 0 N/A N/A 11132 C ...nda3/envs/ldm/bin/python3 21004MiB |
+-----------------------------------------------------------------------------+

2.  Pull Docker images for nvidia/cuda whose versions in the tags match the "CUDA Version” from above.  My findings are that the versions of nvidia/cuda docker image can sometimes be below the version of CUDA installed on the host, but not above.  If you choose to use an image other than one supplied by nvidia/cuda you are on your own to installing dependencies, etc.  

3.  Install a corresponding version of PyTorch in the Docker container that matches the version of CUDA installed on the host.

RUN python3 -m pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu114/torch_stable.html

Notice the cu114 in the URL for PyTorch?  That is at or below the version of CUDA installed on the host.  

4.  Use the —gpus flag with docker run like this:

sudo docker run --gpus all model-train:latest 

5.  Test that a container can also see the GPU by running this command:

docker run -it --gpus all nvidia/cuda:11.4.0-base-ubuntu20.04 nvidia-smi

You should get the same output from that command from within the container and from the host itself.

Anaconda and Docker Buffer Output With Python Application

I deploy Docker containers that run a Python application through a conda environment.  The logging out of Python application would buffer until the container terminated, which makes troubleshooting difficult.  

It turns out that Anaconda will indeed buffer the output unless directed otherwise.  The way to tell conda not to buffer is to use the  "--no-capture-output” flag.  So here’s how I set that in my Docker file:

ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "ldm"]
CMD ["python3", "app.py"]

ECS Task Using GPU Taking Much Longer to Complete Than Expected

I’m using EC2 as a capacity provider for ECS tasks that require a GPU for training a model.  Initially, the tasks were taking much longer to complete than expected.  

After some debugging, it turns out that the CPU was being used rather than the GPU which explains the slowness.  

Calling the following function would return False.

torch.cuda.is_available() 

In my case, there was one main reason for this:

The version of the CUDA drivers on the host (which can be found by running nvidia-smi) need to match the nvidia/cuda container from which you build your container which also needs to match the version of PyTorch with GPU support installed.  

My EC2 instance has CUDA 11.4 installed.  So my Dockerfile looks something like this:

FROM  --platform=linux/amd64 nvidia/cuda:11.4.0-base-centos7
RUN yum update -y
RUN yum -y groupinstall development
RUN yum install -y python3 python3-devel tar gzip awscli git vim xz wget gcc make zlib-devel libjpeg-devel libffi-devel libxslt-devel libxml2-devel

RUN mkdir -p /app
COPY . /app
WORKDIR /app

RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu114/torch_stable.html

ENV ECS_ENABLE_GPU_SUPPORT=true
ENV CUDA_VISIBLE_DEVICES=0

Notice two things about this Dockerfile:

1.  nvidia/cuda:11.4.0-base-centos7 refers to 11.4 which matches the version of CUDA installed on the host

2.  The 114 in this URL also matches the CUDA driver: https://download.pytorch.org/whl/cu114/torch_stable.html 

Once I corrected this, torch.cuda.is_available() returned True.

Cuda Out of Memory Training Dreambooth w/ Stable Diffusion 2.1

 

This is a common error:

CUDA out of memory. Tried to allocate 12.66 GiB (GPU 0; 23.69 GiB total capacity; 15.57 GiB already allocated;

You’ve read all the blogs and Reddit that tell you to set something like this:

export PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.6,max_split_size_mb:64"

But you still get out of memory errors, particularly when trying to use Stable Diffusion 2.1.

Are you trying to use the following:

  --with_prior_preservation --prior_loss_weight=1.0 

Because that could very well be your problem.  It was for me.  

Invalid according to Policy: Extra input fields: content-type S3 Presigned URL

If you receive the following error when trying to store an object in Amazon S3:

<Error>
<Code>AccessDenied</Code>
<Message>Invalid according to Policy: Extra input fields: content-type</Message>
<RequestId>SomeCrazyString</RequestId>
<HostId>AnotherCrazyString</HostId>
</Error>

It's because the policy used to generate the pre-signed POST request did not have content-type as one of the permissible fields.  Set it like so:

[["starts-with", "$Content-Type", ""]]
An empty string tells S3 to allow any content-type, but you can certainly be more specific.

SQLAlchemy Query for Child Objects in One to Many Self Referential Relationship

Finding the answer to a rather simple question proved to be shockingly difficult.

How to query such that parent objects include all child objects in a one to many relationship using SQLAlchemy?

Here’s a contrived example:

class ProductTest(Base):
__tablename__ = 'products_test'
product_id=Column(BigInteger, autoincrement=True, primary_key=True)
parent_variant = Column(BigInteger, ForeignKey('products_test.product_id'))
price=Column('price', Float, default=0.0)
product=Column('product', String(255))
variants = relationship('ProductTest', lazy="joined", join_depth=2)

The important part is this:

variants = relationship('ProductTest', lazy="joined", join_depth=2) 

Notice the lazy and join_depth attributes.  

Booting Raspberry Pi from an SSD

I have spent the last several days trying to get a DeskPi Pro v2 to boot from an SSD without success —— until today.  

I am running Debian Bullseye on the Raspberry Pi primarily to conform to the requirements of Home Assistant Supervised which does not support any other operating system.  But current versions of Debian Bullseye have problems seeing USB devices prior to booting as documented here.   

The primary error message I would see during boot was this:

vcc-sd: disabling

What that means for the Deskpi is it cannot boot from an SSD simply because Debian hasn’t initialized USB devices yet.  But there is a fix.

  1. Use Balena Etcher or a similar tool to flash Debian Bullseye to an SD card.
  2. Use Balena Etcher or a similar tool to also flash the same version of Debian Bullseye to the SSD.  You will need to use an adapter such as a SATA to USB adapter (or whatever interface you’re using for the SSD) to connect the SSD to a computer.
  3. Use Balena Etcher or a similar tool to flash an SD card with Raspberry Pi OS.  
  4. Boot the Raspberry Pi using the Debian SD card.  Do not update the operating system to a new version since there will then be a mismatch between the version on the SD card and the version on the SSD.

 

Run these commands at the Linux terminal:

  1. echo reset_raspberrypi > /etc/initramfs-tools/modules
  2. update-initramfs -uk all

 

Connect the USB adapter with the attached SSD to the Pi to copy the newly generated initrd.img to the SSD.  Mount the SSD to a path such as /mnt/ssd

  1. mount /dev/sda2 /mnt/ssd
  2. cp /boot/initrd.img-{kernel_version_generated}-arm64 /mnt/ssd/boot/initrd.img-{kernel_version_generated}-arm64
  3. Disconnect the SSD and eject the SD card.

 

Boot the Pi using the Raspberry Pi OS SD card.  

  1. Install the gparted application and then launch it in order to resize the root partition on the SSD.
  2. Connect the USB adapter with the attached SSD but then unmount it in gparted in order to resize it.
  3. Use gparted to select “Partition” and then “Check.”  Then apply the operation. 

 

Open up a terminal from within the Raspberry Pi OS

  1. git clone https://github.com/DeskPi-Team/deskpi.gi
  2. cd deskpi
  3. sudo chmod +x install.sh
  4. ./install.sh
  5. sudo apt update
  6. sudo apt full-upgrade
  7. sudo rpi-update
  8. sudo reboot
  9. sudo raspi-config
  10. Navigate to  Advanced Options —> Boot Order and select USB Boot
  11. Navigate back to Advanced Options —> Boot Loader Version and select Latest Version.  Be sure not to choose to reset to defaults after selecting Latest Version.
  12. Reboot when prompted.
  13. sudo -E rpi-eeprom-config --edit
  14. No changes here are needed.  Simply Ctrl + X to save.
  15. sudo reboot 

 

You should now be able to boot the system using Debian Bullseye using an SSD/USB as the root filesystem.

Don’t forget to add the reset_raspberrypi to the system booted from SSD and run update-initramfs as so:

Run these commands at the Linux terminal:

  1. echo reset_raspberrypi > /etc/initramfs-tools/modules
  2. update-initramfs -uk all

Lost connection to MySQL server during query

I ran into an issue where one particular query sometimes resulted in the following error:

(pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

Sometimes the error condition would manifest as:

(2006, "MySQL server has gone away (TimeoutError(110, 'Connection timed out'))")

I could not get my head around why this was happening.  Everything on the Internets suggested that tuning MySQL parameters for timeouts was needed.  So I did that, but nothing with respect to the errors changed.  I tried as many changes to how SQLAlchemy, my database ORM, connects to MySQL hosted using Amazon RDS.  

I logged everything possible in RDS and saw messages such as:

Got an error reading communication packets

But that messages lead me no where.  

When I compared the overall code path for queries that work fine with the one that generated the errors I found one distinction:

For the queries that worked I was getting a database session using SQLAlchemy within the function that executes the query.  For the query that would timeout I was getting a database session at the top of the file that contained the function.  

The connection to the database was happening when the application started up, but would eventually timeout as activity subsided.  By the time the function got triggered, the session that was created from the code at the top of the tile was already closed.  So the application would encounter the “Timeout" and "Lost connection” errors.

Moving the call to get the database session into the body of the function itself solved the problem.  

The specified Checkout Session could not be found

You may see an error like this when using Stripe Checkout:

The specified Checkout Session could not be found. This error is usually caused by using the wrong API key. Please make sure the API keys used to initialize Stripe.js and create the Checkout Session are test mode keys from the same account.

If you are using doing this on behalf of a Stripe Connected account then be sure to use the connected account ID when initializing the Stripe JS library like so:

var stripe = Stripe('{{PLATFORM_PUBLISHABLE_KEY}}', {
stripeAccount: '{{CONNECTED_STRIPE_ACCOUNT_ID}}',
});

 

ImportError: cannot import name '_imaging' with Lambda and M1 Mac

I use Docker for much of my local development environment.  I recently got an M1 MacBook and have switched most development to it.  

But I need to keep my Intel-based MacBook around for at least one reason:  some Python modules for Lamnbda do not build properly on the M1 and the Pillow module is emblematic of such.

I haven’t delved in to figure out why this happens even using the latest Docker Desktop release candidate for the M1.  

This is the error that I see in my Cloudwatch logs when trying to deploy a Python application to Lambda with the Serverless framework using Docker on the M1:

ImportError: cannot import name '_imaging'

Stripe No Such Price Error

I kept experiencing a “No such price” error when attempting to modify a price on behalf of a Stripe connected account.

Turns out that I wasn’t calling modify() with the connected account id, which results in Stripe looking for the price in the platform’s account.  

To modify the price for a connected account, do so like this:

existing_price_change = stripe.Price.modify(
stripe_price_id,
active=False,
stripe_account=stripe_account
)

The last parameter, stripe_account=stripe_account, should be the ID for the connected account. Including it sets the correct HTTP header when sent to Stripe.

Archive Products With Stripe

Stripe discourages deleting products through the API.  In fact, deleting products won’t even work if a price is still active.

But they do support archiving products such that they cannot be used for anything other than reporting, etc.

Here’s how you do it in Python:

 

stripe.Product.modify(
    stripe_product_id,
    active=False,
    stripe_account=stripe_account
)

You only need to include the

stripe_account

parameter if you are acting on behalf of a "connected account."

Why Your New M1 MacBook Can't Use Your Monitor As An External Display

I bought a new M1 MacBook and was super excited to use it.  My monitor has a USB-C input.  The MacBook comes with a USB-C cable.  Awesome.  

Plug everything up.  Not so awesome.  

The monitor could not be used as an external display for the MacBook.  I called Apple Support, who were super nice but not helpful in the least.  They had me reboot the computer, connect and disconnect, etc but nothing worked.

Finally, I found an obscure thread on the Internet saying that it was the cable.  

The cable?  Could it be?  

Yes.  I bought this cable and it worked right away.  The USB-C cable that comes with this super-sophisticated MacBook doesn’t work for external displays.  

Python Startup Performance on Azure App Service

I noticed a huge difference in app startup performance and reliability on Azure App Service when using a FastApi-based Python application depending on whether uvicorn or gunicorn is used.  

 

Fast:

python -m uvicorn app:app --host 0.0.0.0

 

Very slow:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker app:app 

Getting API Keys for Stripe Connected Accounts

I am playing with the Stripe API and creating “connected accounts” which is Stripe lingo for a sub-account of sorts.  

I noticed that the Stripe API from 2018-05-21 returns the test or live API keys after calling stripe.Account.create() depending on whether you used test or live keys when calling the method.  If you set a live key on stripe.api_key you get live keys returned when calling stripe.Account.create(), etc.  

But the “keys” key on the returned JSON does not exist in the 2020-08-27 version of the API (perhaps earlier versions too).

It appears that you should use the platform’s keys for newly created connected accounts.  And you should use your publishable keys, when necessary, for "connected accounts" created under the platform.  

Install Python Dev Headers on Amazon Linux

Took me forever to figure this out:

If you need to install Python 3.8 on Amazon Linux and also need the development headers, you can install the development headers like so:

yum install python38-devel -y

This is an example of the error you will get that requires you to install the Python header files:

c/_cffi_backend.c: fatal error: Python.h: No such file or directory